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ABSTRACT 

This  report  documents  efforts  under  ONR  grant  no.  N00014-94-1-0676.  This  is  an  AASERT 
award  attached  to  parent  grant  NRL  no.  NOOO  1 4-93 -1-G022.  The  purpose  of  the  grant  is  to 
support  research  on  how  group  dynamics  can  emerge  from  collections  of  agents  that  would 
enable  them  to  make  decisions  that  individuals  could  not  or  accomplish  tasks  that  individuals 
could  not. 

Funding  from  the  grant  supported  four  graduate  students  directly;  i.e.,  with  stipends  and 
tuition,  and  a  number  of  undergraduate  students  indirectly,  through  materials  and  supplies 
purchases  to  support  their  independent  study  efforts  in  distributed  intelligence  and  cooperative 
robotics. 

Results  of  these  studies  indicate  that  among  distributed/cooperative  learning  methods,  the  most 
promising  and  appropriate  for  distributed  mobile  agent  applications  is  a  combination  of 
learning  and  behavioral  methods.  In  particular,  the  recommended  method  combines  the  data 
structures  and  execution  cycle  of  the  learning  classifier  system  with  reinforcement  computed 
similarly  to  Q-leaming  and  with  some  stochastic  selection  and  genetics-based  rule-paring 
methods.  These  systems,  in  conjunction  with  message-based  communications  between  agents, 
is  shown  to  be  widely  applicable  and  convergent  in  ideal  scenarios.  The  methods  have  the 
disadvantages  of  being  slow,  and  they  do  not  perform  well  in  sequential  learning  tasks  without 
significant  modifications. 


19971124  044 


)TIC  Q1MLIT2'  ISJAIlOISiD  8 


Overview 

The  purpose  of  the  parent  grant  for  this  AASERT  award  was  to  exanune  methods  for  the 
coordination  of  intelligent  agents  by  modeling  them  individually  as  dynamic  subsystems.  The 
problem  spaces  as  a  whole  then  represented  a  larger  space  into  which  the  agents  were 
embedded.  The  agents  would  couple  to  each  other  in  ways  that  would  result  in  the  effective 
emergence  of  a  single  collective  group  dynamic  which  is,  in  a  mathematically  analytical 
manner,  a  composite  system  integrally  dependent  on  each  subsystem  (agent).  The  agents’  own 
perception  of  the  coupling  signals  would  therefore  represent  knowledge  of  the  state  of  the 
population  as  a  whole.  The  problem  addressed  by  this  research  was  the  mechanism  by  which 
such  coupling  signals  could  be  implemented,  the  information  content  of  such  signals,  and  the 
behaviors  and  decisions  that  could  thereby  be  effected. 

The  original  approach  to  this  problem,  as  supported  under  the  parent  grant,  was  to  encode 
signals  and  behavioral  cues  into  the  wave  characteristics  of  coupled  nonlinear  oscillations. 
This  model  was  inspired  by  biological  systems,  such  as  those  that  control  circadian  rhythms, 
locomotory  systems,  and  learned  responses  to  sensoiy  stimuli.  Because  this  parent  grant  was 
terminated  prematurely  in  the  first  year,  this  original  approach  was  broadened  under  the 
AASERT  award  to  befit  more  general  application  domains,  while  still  limiting  the  studies  to 
distributed  AI  and  decision  theories  for  multiple  agents. 


Student  Projects  Supported 

This  AASERT  award  funded  tuition  and  stipends  for  four  graduate  students,  who  together 
completed  three  M.S.  degrees  and  is  soon  to  result  in  one  Ph.D.,  all  in  electrical  engineering. 
In  addition,  the  material  and  supply  budget  provided  by  the  grant  was  used  to  purchase 
electronic  components  and  software  used  by  undergraduate  independent  study  students  in  a 
supporting  role.  These  students  assisted  the  graduate  students  in  hardware  experiments, 
software  simulation,  and  proof  of  concept  projects.  Such  projects  included  the  construction  of 
radio-frequency  coupling  transceivers,  small  robotic  platforms,  and  software  modules  for  the 
implementation  of  the  devised  learning  and  coordination  techniques. 

Subsequent  sections  contain  the  thesis  abstracts  of  each  student.  Following  these  are  copies  of 
conference  and  journal  publications  authored  or  co-authored  by  these  students. 


“Architecture  Design  and  Simulation  for  Distributed  Learning  Classifier  Systems, "  M.S. 
thesis  by  Douglas  G.  Gaff,  1995. 

Abstract:  In  this  thesis,  we  introduce  the  Distributed  Learning  Classifier  system  (DLCS)  as  a 
novel  extension  of  J.  H.  Holland’s  standard  learning  classifier  system.  While  the  standard  LCS 
offers  effective  real-time  control  and  learning,  one  of  its  limitations  is  that  it  does  not  provide  a 
mechanism  for  allowing  communication  between  LCS  agents  in  a  multiple-agent  scenario. 
Often  multiple-agents  are  used  to  solve  large  tasks  collectively  by  subdividing  the  task  into 
smaller  parts.  Multiple  agents  can  also  be  used  to  solve  a  task  in  parallel  so  that  a  solution  can 
be  arrived  at  more  rapidly.  With  the  DLCS,  we  introduce  mechanisms  that  satisfy  both  of 
these  cases,  while  still  providing  compatible  operation  with  the  LCS. 
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We  introduce  three  types  of  messages  that  can  be  passed  between  DLCS  agents.  The  first,  the 
classifier  message,  allows  agents  to  share  learned  information  with  one  another,  thereby 
helping  agents  benefit  from  each  other’s  successes.  The  second,  the  action  message,  allows 
agents  to  “talk”  to  one  another.  The  third,  the  bucket  brigade  algorithm  payoff  message, 
extends  the  chain  rewarding  payoff  scheme  of  the  standard  LCS  to  multiple  DLCS  agents. 

Finally,  we  present  some  simulation  results  for  both  the  standard  LCS  and  the  DLCS.  Our 
LCS  simulations  examine  some  of  the  important  aspects  of  learning  classifier  system  operation, 
as  well  as  illustrate  some  of  the  shortcomings.  The  DLCS  simulations  justify  the  distributed 
architecture  and  suggest  future  directions  for  achieving  learning  among  multiple  agents. 

[Comments:  Mr.  Gaffs  research  is  the  first  known  distributed  system  adaptation  of  the  LCS. 
The  resulting  paper  (see  Appendix)  was  well-received.  The  results,  though,  served  to  illustrate 
limitations  in  the  learning  performance  of  the  LCS  that  was  to  be  addressed  in  the  next  M  S. 
thesis  project  to  be  supported.  Mr.  Gaff  is  currently  a  senior  software  engineering  for  Spatial 
Positioning  Systems,  Inc.,  of  Reston,  VA.] 


“A  Distributed  Q-learning  Classifier  System  for  Tack  Decomposition  in  Real  Robot  Learning 
Problems, "  M.S.  thesis  by  Kevin  L.  Chapman,  1996. 

Abstract;  A  distributed  reinforcement-learning  system  is  designed  and  implemented  on  a 
mobile  robot  for  the  study  of  complex  task  decomposition  in  real  robot  learning  environments. 
The  distributed  Q-leaming  Classifier  System  (DQLCS)  is  evolved  from  the  standard  Learning 
Classifier  System  (LCS)  proposed  by  J.  H.  Holland.  Two  of  the  limitations  of  the  standard 
LCS  are  its  monolithic  nature  and  its  complex  apportionment  of  credit  scheme,  the  bucket 
brigade  algorithm  (BBA).  The  DQLCS  addresses  both  of  these  problems  as  well  as  the 
inherent  difficulties  faced  by  learning  systems  operating  in  real  environments. 

We  introduce  Q-learning  as  the  apportionment  of  credit  component  of  the  DQLCS,  and  we 
develop  a  distributed  learning  architecture  to  facilitate  complex  task  decomposition.  Based 
upon  dynamic  programming,  the  Q-leaming  update  equation  is  derived  and  its  advantages  over 
the  complex  BBA  are  discussed.  The  distributed  architecture  is  implemented  to  provide  for 
faster  learning  by  allowing  the  system  to  effectively  decrease  the  size  of  the  problem  space  it 
must  explore. 

Holistic  and  monolithic  shaping  approaches  are  used  to  distribute  reward  among  the  learning 
modules  of  the  DQLCS  in  a  variety  of  real  robot  learning  environments.  The  results  of  these 
experiments  support  the  DQLCS  as  a  useful  reinforcement  learning  paradigm  and  suggest 
future  areas  of  study  in  distributed  learning  systems. 

[Comments:  Though  not  fully  reflected  in  the  abstract,  the  primary  contributions  of  Mr. 
Chapman’s  thesis  are  in  the  decomposition  of  sequential  learning  tasks  and  their 
implementation  on  robotic  hardware.  It  is  in  robot  learning  and  sensor  fusion  that  a  novel 
application  soon  presented  itself,  although  the  solutions  methods  were  to  eventually  differ  (see 
below).  Mr.  Chapman  is  now  a  software  engineer  for  the  Intelligent  Decision  Support  Systems 
Group  at  Raytheon  E-Systems  Corporation  in  Falls  Church  Virginia.] 
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“A  Fuzzy  Logic  Solution  for  Navigation  of  the  Subsurface  Explorer  Planetary  Exploration 
Robot,  ”  M.S.  thesis  by  Veronica  A.  Gauss,  1997. 

Abstract:  An  unsupervised  fuzzy  logic  navigation  algorithm  is  designed  and  implemented  in 
simulation  for  the  Subsurface  Explorer  planetary  exploration  robot.  The  robot  is  intended  for 
the  subterranean  exploration  of  Mars,  and  will  be  equipped  with  acoustic  sensing  for  detecting 
obstacles.  Measurements  of  obstacle  distance  and  direction  are  anticipated  to  be  imprecise 
however,  since  the  performance  of  acoustic  sensors  is  degraded  in  underground  environments. 
Fuzzy  logic  is  a  satisfactory  means  of  addressing  imprecision  in  plant  characteristics,  and  has 
been  implemented  in  a  variety  of  autonomous  vehicle  navigation  applications.  However,  most 
fuzzy  logic  algorithms  that  perform  well  in  unknown  environments  have  large  rule-bases  or  use 
complex  methods  for  tuning  fliiKy  membership  hmctions  and  rules.  These  qualities  make  them 
too  computationally  intensive  to  be  used  for  planetary  exploration  robots  like  SSX. 

In  this  thesis,  we  introduce  an  unsupervised  fuzzy  logic  algorithm  that  can  determine  a 
trajectory  for  the  SSX  through  unknown  environments.  This  algorithm  uses  a  combination  of 
simple  fusion  of  robot  behaviors  and  self-tuning  membership  functions  to  determine  robot 
navigation  without  resorting  to  the  degree  of  complexity  of  previous  fiizzy  logic  algorithms. 

Finally,  we  present  some  simulation  results  that  demonstrate  the  practicality  of  our  algorithm  in 
navigating  indifferent  environments.  The  simulations  Justify  the  use  of  our  fiizzy  logic 
technique,  and  suggest  future  areas  of  research  for  fuzzy  logic  navigation  algorithms. 

[Comments:  Ms.  Gauss  had  previously  worked  for  the  NASA  Jet  Propulsion  Laboratory  in 
the  Mars  Pathfinder  program.  The  subsurface  explorer  seemed  a  suitable  platform  for 
application  of  distributed  learning  methods,  and  also  provided  the  possibility  to  leverage  any 
successful  results  into  future  research.  However,  she  was  given  complete  freedom  to  explore 
and  determine  the  most  feasible  and  appropriate  control  method  for  the  SSX,  and  based  on 
significant  comparative  studies,  chose  the  fiizzy  controller  described.  Fuzzy  controllers, 
though,  show  little  promise  for  distributed  applications.  Ms.  Gauss  is  now  a  software  engineer 
at  Computer  Motion,  Inc.,  of  Goleta,  CA.  Computer  Motion  makes  robots  for  surgical 
applications] 


“Q-learning  in  a  Production  Rule  System  for  Applications  to  Control  Systems,  ”  tentative  title 
for  Ph.D.  thesis  by  Paul  J.  Johnson,  1997. 

Tentative  Abstract:  The  Learning  Classifier  System  (LCS),  originally  proposed  by  John 
Holland  in  1986,  combines  credit  assignment  (i.e.  reinforcement  learning)  and  rule  discovery 
into  an  adaptive,  message-passing,  production  rule  system  (PRS)  capable  of  learning  how  to 
both  plan  and  react  in  response  to  sensory  inputs.  While  the  LCS  shows  great  promise  in 
principle,  it  has  not  been  very  successful  in  control  system  applications,  most  likely  due  to  its 
method  of  credit  assignment:  the  Bucket  Brigade  Algorithm. 

To  improve  the  performance  of  Holland’s  original  LCS  in  control  system  applications,  this 
thesis  proposes  the  use  of  Q-leaming  as  a  replacement  for  the  credit  assignment  component  of 
the  LCS.  The  resulting  system,  termed  the  Q-leaming  Production  Rule  System  (QPRS), 
maintains  a  mle  discovery  mechanism  to  complement  the  reinforcement  learning  capabilities 
associated  with  Q-leaming.  The  QPRS  requires  minor  modifications  to  the  LCS  mle  discovery 
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mechanism  to  account  for  the  new  reinforcement  learning  component.  Likewise,  the  Q- 
leaming  algorithm  has  been  modified  slightly  from  its  original  form  to  function  within  the 
overall  structure  of  the  QPRS. 

Q-leaming  is  used  within  the  QPRS  as  a  reinforcement  learning  mechanism,  but  it  is  equally 
valid  to  consider  Q-learning  to  be  a  form  of  recursive  dynamic  programming.  In  this  sense,  Q- 
leaming  can  be  susceptible  to  the  same  problems  that  plague  dynamic  programming.  One  such 
problem  is  the  curse  of  dimensionality.  As  the  name  suggests,  as  the  number  of  discrete  state- 
action  pairs  increases,  full  enumeration  of  all  state-action  possibilities  can  require  prohibitive 
computing  resources.  However,  when  Q-leaming  is  combined  with  rule  discovery  in  the 
QPRS,  this  curse  of  dimensionality  can  be  attenuated.  By  providing  the  QPRS  with  the 
capabilities  to  discover  previously  untested  state-action  pairs,  it  is  not  necessary  to  fully 
enumerate  the  entire  state-action  space  throughout  all  time. 

When  the  states  and/or  actions  are  continuous  variables,  as  is  usually  the  case  in  control  system 
problems,  some  form  of  quantization  must  first  take  place.  The  concept  of  full  state 
enumeration  has  a  slightly  different  meaning  when  the  state-action  pairs  are  quantized  versions 
of  continuous  variables.  It  is  up  to  the  designer  to  choose  the  level  of  quantization.  This 
level  of  quantization  will  directly  affect  the  problems  associated  with  the  curse  of 
dimensionality.  There  is  a  tradeoff  between  increasing  the  resolution  (finer  quantization)  and 
making  the  problem  more  computationally  intensive.  This  tradeoff  leads  directly  to  one  of  the 
biggest  changes  we  have  introduced  to  the  QPRS.  We  have  modified  the  Q-leaming  algorithm 
within  the  QPRS  to  incorporate  interpolation  in  the  Q-value  update  equations.  We  make  use 
of  our  dynamic  programming  analogy  to  create  a  better  PRS  by  extending  its  interpolation 
algorithm  to  learning  systems.  Interpolation  allows  for  a  finer  degree  of  response  from  a  given 
level  of  quantization.  This  eliminates  the  need  to  more  finely  quantize  a  given  variable. 

The  ties  between  DP  and  control  systems  suggest  meaningful  cost  functions  and  specific  ways 
to  change  cost  functions  to  elicit  desired  responses.  This  is  not  the  case  with  the  BBA,  nor 
with  other  “ad  hoc”  cost  functions.  There  is  a  long  history  behind  DP,  and  we  wish  to  use  this 
experience  to  develop  cost  functions  to  use  with  Q-leaming  in  our  QPRS.  The  use  of  a 
modified  form  of  Q-leaming  has  provided  the  QPRS  with  the  capabilities  needed  for  successful 
application  to  control  system  problems.  With  the  development  of  the  QPRS,  a  new  tool  now 
exists  for  adaptive  optimal  control  in  the  absence  of  plant/model  information. 

[Comments;  This  study  focused  on  the  predictive  study  of  the  underlying  dynamics  of 
classifier-like  systems  and  systems  with  Q-leaming-like  reinforcement  methods.  It  provides  a 
more  analytical  fi'amework  for  the  study  of  learning  system  convergence.  Paul  Johnson  has 
finished  his  research  on  this  topic  and  has  only  to  complete  and  defend  his  dissertation.  He  is 
now  a  research  engineer  for  Raytheon  Corporation  in  Massachusetts.] 


Appendix 

Attached  are  copies  of  papers  authored  or  co-authored  by  students  supported  directly  and 
indirectly  by  project  funds,  and  papers  for  which  the  undergraduate  students  provided 
laboratory  assistance.  Additional  papers  will  appear  in  the  future. 
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-  lEEH  'R.AJ'oACTTONS  ON  SV5icNtS.  NtAN.  AND  CYBERN^  i  iCS— PART  3:  C^'BERN'tTICS.  VQL.  NO.  -i.  aL'GL’ST  199*^ 

A. Reactive  Coordination  Scheme 
for  a  Many-Robot  System 

Kirr^berly  Sharman  Evans,  Cem  Unsai,  Member,  and  John  S.  Bay,  Seruor  Member.  IEEE 


Abstract — This  paper  presents  a  novel  approach  for  coordi¬ 
nating  a  homogeneous  system  of  mobile  robots  using  implicit 
communication  in  the  form  of  broadcasts.  The  broadcast-based 
coordination  scheme  was  developed  for  the  ,4rmy  Ant  swarm — a 
system  of  small,  relatively  inexpensive  mobile  robots  that  can 
accomplish  complex  tasks  by  cooperating  as  a  team.  The  primary 
drawback,  however,  of  the  Army  Xnt  system  is  that  the  absence 
of  a  central  supervisor  poses  difficulty  in  the  coordination  and 
control  of  the  agents.  Our  coordination  scheme  provides  a  global 
•■group  dvmamic”  that  controls  the  actions  of  each  robot  using 
imly  local  interactions.  Coordination  of  the  swarm  is  achieved 
vvifh  signals  we  call  ‘*heartbeats,''  Each  agent  broadcasts  a  unique 
heartbeat  and  responds  to  the 'collective  behavior  of  all  other 
heartbeats.  We  generate  heartbeats  with  van  der  Poi  oscillators. 
In  this  application,  we  use  the  known  properties  of  coupled  van 
der  Pol  oscillators  to  create  predictable  group  behavior.  Some  of 
the  properties  and  behaviors  of  coupled  van  der  Pol  oscillators 
are  discussed  in  detail.  We  emphasize  the  use  of  this  scheme  to 
allow  agents  to  simultaneously  perform  an  action  such  as  lifting, 
steering,  or  changing  speed.  The  results  of  experiments  performed 
on  three  actual  heartbeat  circuits  are  presented  and  the  behavior 
of  the  realized  system  is  compared  to  simulated  results.  We  also 
demonstrate  the  application  of  the  coordination  scheme  to  global 
speed  control. 


L  Lntrodcction 

OBILE  robots  are  increasingly  being  considered  for 
industrial,  military,  and  scientific  applications.  Much 
of  the  robotics  research  to  dace  has  focused  on  improving 
'he  sophistication  of  individual  robot::  to  accomplish  more  de¬ 
manding  and  complex  tasks.  We  propose  chat  many  tasks,  such 
as  large  macenai  handling  problems,  are  best  achieved  using 
iarge  numbers  of  reiacivsiy  unintelligent  robots.  Typically,  a 
aistnbuted  approach  is  more  desirable  in  such  a  scenario  since 
'.ne  communications  overhead  necessary  for  central  control  is 
prohibitive  for  large  systems  of  mobile  robots,  or  s^varms. 
We  further  suggest  that  homogeneous  swarms,  which  are 
composed  of  similar  robots,  have  many  advantages  over 
heterogeneous  systems.  The  goal  of  the  Army  Anc  Project  [2], 

.Manuscript  recsived  January  2,  1995:  revised  February  U.  1996  and 
June  I  a.  1996.  This  paper  is  based  upon  work  supported  in  part  by  the 
Naval  Research  Laboratory  under  Grant  NOOOl— 93-i-b022.  Office  of  Naval 
Research  under  Grant  NOO 1 4-94- 1-0676,  and  the  National  Science  Foundation 
under  Grant  rRi-9202423.  Tr.e  content  of  this  paper  docs  not  necessarily  refiect 
'.r.e  position  or  policy  of  the  United  States  Government. 

:<-  Sharman  Evans  is  with  Digital  Directory,  Knoxville.  TN  37923  USA. 

C,  U.nsai  is  with  the  Robotics  Institute,  Carnegie  .Mellon  University, 
?i::sourgn,  PA  15213  USA  fe-mail;  unsai @ri.cmu.idu). 

J.  S.  3ay  is  with  tr.e  Braaley  Department  of  Electrical  Engineering.  Virginia 
.°Oiy:;cnnic  Lnsticuce  and  State  University',  Blacksburg,  VA  24061-0111  USA 
e-maii:  cay@vt.eau). 

P'ualisher  Item  Idenuner  S  1083-^-1 9(97)03373-8, 


[IS],  [19]  is  to  develop  a  homogeneous  system  of  autonomous 
mobile  robots  with  the  following  characteristics; 

•  The  Army  Ant  system  is  modular,  so  that  agents  are 
interchangeable.  Since  any  agent  can  assume  the  role  of 
any  failed  teammate,  the  system  is  immune  to  the  single 
point  failures  that  plague  heterogeneous  systems  where 
a  failure  in  the  chain  of  command  can  result  in  system 
failure.  The  failure  of  one,  or  even  several,  army  ants  will 
not  adversely  affect  the  performance  of  the  swarm.  The 
innate  robusmess  of  the  Army  Ant  swarm  is  perhaps  the 
most  critical  result  of  homogeneity. 

•  Tne  Army  Ant  swarm  is  more  dynamic  than  a  het¬ 
erogeneous  system  in  chat  it  may  divide  into  smaller 
groups  when  fewer  agents  are  required  for  a  task.  There 
is  never  a  risk  of  incomplete  hierarchies.  Furthermore, 
homogeneous  systems,  because  they  typically  do  not 
use  centralized  control  methods,  can  accommodate  large 
numbers  of  agents. 

•  .Army  ants  are  physically  small  and  relatively  unsophis* 
ticaced.  The  simplicity  of  the  ants,  combined  with  the 
feaaires  discussed  above,  will  allow  them  to  be  mass 
produced  fairly  inexpensively.  Tne  intention  is  to  create 
a  system  that  can  tolerate  the  loss  or  a  few  robots,  both 
in  terms  of  cost  and  system  performance.  A  possible 
drawback  of  homogeneity  is  the  possibilicv'  of  agents 
being  overqualified  for  a  job.  but  the  robustness  of  the 
system  more  chan  compensates  for  this  cost. 

.Army  anc  agents  are  completely  autonomous  and  have  no 
a  priori  information  about  their  environment.  Tneir  behavior 
is  reactive  in  that  individual  agents  respond  to  stimuli  in 
the  form  of  sensory  inputs.  This  c>'pe  of  behavior,  known 
as  sensor  driven  behavior,  offers  greater  flexibiiicy  to  cope 
with  changing  environments.  Unlike  centralized,  planner  based 
approaches,  wherein  robots  expend  resources  gadiering  and 
processing  information,  a  reactive  approach  allows  robots  to 
respond  quickly  to  changes  in  the  environment. 

Sensor  driven  behavior  is  consistent  with  a  distributed 
control  approach.  .An  agent’s  control  algorithm  determines  an 
action  based  on  the  local  information  that  fiows  from  sensory 
inputs.  Group  behavior  is  achieved  through  the  simple  actions 
of  many  unintelligent  agents.  For  example,  an  agent  is  attracted 
CO  a  payload  destination  by  receiving  an  infrared  signal  from 
a  beacon  that  is  placed  at  the  proper  location.  Each  agent 
responds  independently  to  the  beacon,  but  the  actions  of  each 
individual  result  in  the  group  of  agents  gathering  at  the  beacon. 

The  flexible  nature  the  .Army  Anc  swarm  lends  itself  to 
many  different  applications.  .Army  ants  are  well  suited  for 
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;az2rdoas  operctions  such  us  mine  s’>veeping,  nuclear  po'^er 
:Lnc  maintenance  work  and  miiitarv'  applications,  where  the 
-n'dronment  is  unsafe  for  humians,  and  the  ns!<  tactor  is  coo 
nich  to  utilize  expensive,  highly  specialized  robots. 

The  .\rmy  Ant  approach  is  not  intended  to  be  the  answer 
:o  ail  autonomous  robot  applications.  However,  the  tact  that 
not  ail  robotic  tasks  require  highly  intelligent  agents  and  that 
a  croup  of  seif-or^anizin^  agents  can  exhibit  a  higher-le\el 
benavior/intelligence  emerging  as  a  result  of  agent  interactions, 
.nakes  this  approach  attractive.  It  is  obvious  that  Army  Ant 
robots  are  not  appropnace  ror  cas^s  requiring  specialized 
I  division  of  .labor  such  as  assembling  a  complex  machine. 

I  Our  approach  is  appropriate  to  teams  or  unskilled  laborers 
with  little  differentiation  of  responsibilities.  It  is  not  a  valid 
proposition  where  there  is  no  economy  ot  scale,  where  a  few 
intelligent  asents  are  far  better  than  a  swarm  of  unintelligent 
agents. 

A.  Cooperative  Behavior 

The  ability  of  the  Army  Ant  sw'aim  to  accomplish  complex 
coals  relies  upon  implicit  cooperation  between  individual 
agents.  It  is  essential  to  distinguish  between  expiicic  and 
I  oiici:  cooperation  when  describing  the  behavior  of  a  system  of 
mobile  robots.  Mataric  [16]  dennes  explicit  cooperation  as  ”a 
set  of  interactions  between  agents  which  involve  e.xchanging 
information  or  perfoiming  actions  in  order  to  help  ocher  agents 
achieve  their  soals.*’  In  contrast,  implicit  cooperation  consists 
of  '^actions  that  are  a  part  of  the  agent's  own  goal-achieving 
behavior,  alchoug.h  they  may  have  effects  in  the  world  chat 
help  ocher  agents  achieve  their  goals. ' 

We  reccsnize  implicit  cooperation  as  the  type  of  interaction 
found  quite  frequently  in  natural  systems.  Insects  are  not 
aitniistic.  yet  colonies  of  insects  collectively  accomplish  goals 
such  as  transporting  food  and  building  structures  [S],  [9],  [12]. 
Vanous  studies  of  insects  have  shown  that  colonies  operate  in 
I  a  aistnbuced  fashion,  where  individual  insects  follow  a  rew 
simple  rules.  Likewise,  simple  interactions  between  relatively 
uninceiliaenc  army  ants  will  produce  rather  complex  system 
behavior'^  Beni  and  Wang  [4]  refer  to  this  phenomenon  as 
swarm  incelligence, 

.-\lth0u2h  both  explicit  and  implicit  communications  may 
■  complement  each  ocher,  as  suggested  by  several  researchers 
[1].  [12],  use  of  expiicic  communications  has  its  problems  in 
a  swarm  of  the  size  envisioned  by  our  approach.  Bandwidth 
issues  is  one,  as  well  as  the  need  to  '‘address”  the  agents, 
which  will  cause  the  swarm  to  lose  its  homogeneity.  Explicit 
communications  also  adds  to  cost  and  complexity.  These  are 
two  important  issues  we  want  to  avoid  in  our  scenario,  h 
has  been  shown  that  “decentralized  control  without  expiicic 
communication  can  be  used  in  performing  cooperative  tasks 
requiring  a  collective  behavior”  [12]. 

Tlaere  are  many  instances  when  agents  must  not  just  coop¬ 
erate.  but  do  so  in  a  coordinated  manner.  While  to  cooperate 
.means  to  ‘work  coward  a  common  goal,  to  coordinate  means 
:o  oenbrm  a  common  action  or  movement  in  a  harmonized 
manner.  In  the  material  transport  example,  agents  may  have  to 
lift  a  oailec  simultaneously  so  that  the  payload  stays  level;  also. 


Fig.  1.  Group  dynamics  in  ihe  .Army  .Anc  scenario. 


cransponing  the  payload  is  a  much  easier  task  when  agents 
steer  together.  Various  agent  actions  may,  at  some  time  or 
another,  have  to  be  performed  synchronously.  In  this  paper 
we  present  a  broadcast-based  coordination  scheme  chat  allows 
us  to  achieve  multi-agent  coordination  without  programming 
the  behavior. 

3.  Swarm  Coordination 

Homogeneous  systems  of  agents,  by  nature,  lack  a  control 
structure  for  coordination.  There  exists  no  hierarchy  of  com¬ 
mand  by  which  lower  ranked  agents  follow  the  actions  of  their 
superiors.  For  a  many  agent  system,  explicit  message  passing 
is  likely  to  create  a  communication  bottleneck.  Only  indirect 
communication,  in  the  torm  of  broadcasts  or  cues,  offers  a 
practical  solution  to  the  swarm  coordination  problem. 

We  define  a  cue  as  a  prompt  that  a  robot  perceives  trom 
its  environment.  .A  broadcast  consists  of  information  that  is 
transmitted  indiscriminately,  so  chat  all  robots  receive  the 
same  information.^  Using  broadcast  signals  we  show  that  we 
can  create  a  “group  dynamic”  that  ail  agents  can  sense  and 
influence,  but  which  does  not  reside  in  any  individual. 

As  illustrated  in  Fig.  1.  agents  interact  with  their  envi- 
ronment  through  sensors,  actuators  and  broadcast  signals. 
However,  they  are  not  permitted  to  address  each  other:  ad¬ 
ditionally,  the  actions  of  an  agent  have  no  direct  effect  on 
other  agents  (or  have  explicit  “interpretation”). 

The  group  dynamic  is  influenced  and  generated  by  the  ac¬ 
tions  of aU  the  robots,  but  is  dominated  by  none  in  panicuiar.  It 
is  sustained  by  the  contribution  of  the  population.  It  influences 
the  behavior  and  decision  of  the  robots,  but  allows  the  robots  to 
function  with  local  sensing,  short  broadcast  communications, 
and  no  need  for  global  information  or  maps.  It  has  no  power 
of  direct  actuation,  but  guides  robots’  behavior,  like  a  group 
conscience. 

The  global  dynamics  of  the  system  proposed  will  obviously 
have  nonlinear  characteristics.  The  system  must  be  adaptive 
and  sensitive  to  changes  in  individual  “states  ’  of  the  agents. 
To  achieve  such  a  system,  we  propose  the  use  of  signals  we  call 
“heartbeats,”  of  which  each  agent  has  at  least  one,  and  which 
can  respond  to  the  collective  behavior  (or  global  dynamics  of 
the  environment/system)  of  ail  other  robots'  heartbeats. 

■  For  e.tampie.  a  locator  beacon  signal  is  a  cue.  while  a  robot  s  signal  lor 
help  is  ^  broaacast. 
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Dirfereac  scenarios  where  explicit  communicadons  are  com¬ 
bined  w'ich  che  “heanbeacs”  approach  can  also  be  considered 
:o  obiain-more  precise  swarm  coordinacion  at  the  expense  of 
cost  and  complexicy.  Agents,  may  be  given  idendties  sepai-ace 
from  their  heanbeats  so  that  it  would  be  possible  to  ask  one  to 
'  \  cum  its  heanbeac  off  to  elect  icseif  a  leader  (see  Section  II-A). 
Or  a  group  of  agents  cceuld  solicit  another  agent  to  join  in  by 
asking  ic  to  cum  its  heartbeat  on. 

Each  agent  broadcasts  a  unique  primary  heartbeat  and 
responds  to  the  collective  behavior  of  all  ocher  heanbeats; 
they  do  not  use  the  heanbeats  to  identify  or  address  one 
another.  We  generate  the  ever-present,  hardwired  heartbeats 
W'ich  nonlinear  oscillator  circuits  called  van  der  Pol  oscillators. 
The  oscillators  form  a  coupled  network  w'hen  each  agent  adds 
components  of  others’  heanbeats  to  its  own.  Over  a  large 
range  of  frequencies  and  coupling  factors,  coupled  van  der  Pol 
oscillators  will  synchronize  their  outputs  [10].  This  propeny 
is  known  as  frequency  entrainment} 

Bay  and  Hemami  [3]  showed  chat  coupled  van  der  Pol 
oscillators'*  could  be  used  to  model  'the  central  pattern  gen¬ 
erators  (CPG)  that  stimulate.  Itmb  commands  used  in  human 
walking  and  jumping.  Cohen  [6]  used  coupled  oscillators  to 
model  the  swimming  moaon  of  the  sea  lamprey.  His  research 
points  to  the  plausibility  that  the  sw'imming  speed  of  a  rish 
is  controlled  by  an  initial  alteration  of  the  frequency  of  an 
individual  oscillator  pair  that  results  in  all  ocher  oscillators 
entraining  to  the  new  frequency.  We  use  van  der  Pol  oscillators 
to  mimic  this  type  of  behavior  in  our  system  of  mobile  robots. 
Other  researchers  have  realized  chat  CPG’s,  which  control 
without  central  intelligence,  hold  tremendous  potential  for 
robotics  applications.  Crisman  and  Ayers  [7]  have  formed  a 
pannership  to  design  a  mobile  robot  suitable  for  operation  in 
shallow  water.  The  eight-legged  walking  machine  is  patterned 
after  the  American  lobster,  which  is  capable  of  walking  in 
any  direction,  including  laterally.  Their  CPG  based  controller 
coordinates  and  controls  the  robot’s  motion. 

We  exploit  the  frequency  entrainment  property  to  develop  a 
global  group  dynamic  that  adaptively  controls  and  coordinates 
a  swarm  of  agents.  Because  synchronization  is  an  inherent 
property  of  the  coupled  oscillators,  we  need  not  program  the 
behavior.  Synchronization  occurs  whenever  agents  ’‘listen”  to 
:he  heartbeats.  Wliile  a  more  cradicionai  approach  such  as 
broadcasting  digital  information  over  an  ethemec  is  an  option, 
ic  is  a  far  more  sophisticated  solution  than  the  coordination 
problem  demands.  Our  coupled  oscillator  approach  to  the 
coordination  of  a  swarm  of  agents  is  a  more  ‘‘natural”  and 
simplistic  solution. 

This  ’‘reactive  broadcast”-c>'pe  of  communication  enables 
the  swarm  to 

•  sense  when  agents  are  added  or  dropped  from  the  group; 

•  naturally  choose  a  group  leader: 

-  Frequency  entnmment  is  a  phenomenon  chat  occur?  when  a  periodic  force 
:s  aspiied  :o  a  system  whose  oscillation  is  free  and  seif  excicect  and  the 
ietf-exciced  osculation  fails  into  synchronizacioa  with  the  driving  frequency 
.’3|. 

’  Van  vier  Pol  osciilacors  are  widely  used  for  entninmeni  and  biological 
T:odeiing  in  past  literature.  This  fact  and  characteristics  such  as  robustness 
ana  Simplicity  motivated  us  to  use  VDP  for  our  application. 


Fig.  2.  Single  van  der  Pol  oscillator.  (Coupling  factors  for  a  multiple 
oscillator  scheme  are  also  shown.) 

•  react  to  a  single  agent,  i.e.,  any  individual  can  effect  a 
change  in  the  group  behavior. 

C.  Hardware  Goals 

W'e  show  that  our  broadcast-based  coordination  scheme  can 
be  used  for  multi-agent  coordination  in  two  ways.  First,  we 
demonstrate  the  synchronization  of  three  agent  heanbeats. 
We  do  this  by  realizing  three  van  der  Pol  oscillators  whose 
signals  we  broadcast  and  receive  using  an  FM  communication 
link.  Our  aim  is  to  build  an  inexpensive  hardware  system 
whose  performance  closely  matches  simulated  results.  Next 
we  use  the  coupled,  three  oscillator  network  to  develoo  a 
global  speed  controller.  We  integrate  one  heanbeac  circuit 
into  an  army  ant,  while  allowing  the  ocher  tw'o  circuits  to 
operate  as  standalone  (only  one  army  ant  was  available  for 
this  experiment).  We  show  that  when  the  heanbeats  couple, 
the  agents  are  commanded  to  travel  at  a  common  speed.  By 
increasing  the  coupling  coefnciencs.  w'e  are  able  to  effect  a 
global  increase  in  speed. 

VDP  oscillators  will  broadcast  and  receive  heanbeats  using 
an  inexpensive  FM  communication  system.  Tne  FM  link 
introduces  some  distonion  into  the  network  of  coupled  oscilla¬ 
tors,  aitenng  its  behavior  somewhat.  However,  the  deviations 
produced  by  coupling  the  system  with  FM  do  not  adversely 
affect  performance  goals  of  our  system. 

II.  Van*  Der  Pol  Oscillators 

The  nonlinear  oscillators  used  in  our  distributed  system  are 
described  by  the  well-known  van  der  Pol  equation  which  is 
used  to  describe  an  RLC  circuit  with  a  nonlinear  resistor, 
or  equivaJenciy,  a  mass-spring-damper  system  with  a  position 
dependent  damping  coefncienc  [17] 

The  block  diagram  in  Fig,  2  shows  the  construction  of  a  single 
oscillator  with  p  =  =  1.  It  has  two  integrators,  square- 

law  and  multiplier  nonlinearities,  and  a  gain  parameter  u;-  chat 
cofTcsponds,  roughly,  to  the  squared  frequency  of  oscillation. 
Coupling  is  effected  through  the  summinz  junction. 

Other  signals  £j  are  coupled  to  oscillator  i's  signal  cr; 
through  a  coupling  coefficient  Xij.  which  can  be  positive 
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.Ci[0r>'^  or  negative  (inhibitory)  (see  also  [5]).  Tl^*e  absolute 
ue  of  a  coupiine  factor.  A,  can  range  in  value  trona  0  tor 
coupling  to  1.0  tor  full  coupling. 

The  primar/  broadcast  heartbeat  ,  is  invanant  in  order  to 
Did  confusion  when  task-specihc  roles  are  assumed.  Th.e 
emal  signal(s)  and  any  secondary  broadcast  signals  may 
modified  by  coupling  terms  to  affect  the  agent's  behavior, 
i  may  use  as  many  different  signal  channels  (each  coupled 
different  network  topologies)  as  necessary  to  accomplish 
r  tasks,  but  we  seek  to  keep  these  to  an  absolute  minimum 

m. inimize  complexity  and  force  the  issue  of  emergent 
.'Operative  dynamics, 

Use  Of  van  der  Pol  Oscillators  in  a  Midn-A^ent  System 

3y  adjusting  the  coupling  coefficients  between  osciila- 
s/azents,  it  is  possible  to  use  van  der  Pol  oscillators  tor 
ferent  purposes  such  as  leader  selection  and  synchroniza- 

n. 

/)  Leader  Selection:  To  implement  our  technique  for  leader 
ection.  we  suggest  that  the  oscillator  frequency  given  to 
:h  agent  would  be  chosen  randomly  at  the  factory'.  If  the 
cuencies  vary  continuously,  then  in  all  likelihood,  each 
^nt  will  have  a  unique  frequency  to  differentiate  itself, 
:eciaily  if  the  team  size  is  relatively  small  (e.g.,  <10  in 
aopulation  of  100  robots).  Note  that  we  still  do  not  allow 
asent  to  address  another:  the  heartbeat  frequencies  are  used 
'  discrimination  rather  than  identification. 

2)  Svnchronizotion:  Several  phases  (e.g..  pallet  lifting, 
erine  or  changing  speed')  of  the  Army  Ant  scenario  require 
It  the  asents  act  as  a  team.  Therefore,  they  must  somehow 
nchronize  their  actions.  Tne  agents  implicitly  formed  a 
:m  w'hen  they  couoled  to  the  other's  broadcast.  As  a  team, 
ly  form  a  fully-connected  network.  Tne  coupled  network 
oscillators  can  entrain  so  that  the  internal  heartbeats  of 
:h  asent  oscillate  at  the  same  frequency  as  its  teammates  . 
sardiess  (almost)  of  the  original  frequency  assigned  to  it. 
Tne  positively-coupled  network  is  known  to  entrain  to  a 
mmon  frequency  for  a  broad  range  of  individual  frequencies 
0).  [15].  Fig.  3  shows  a  simulation  of  tour  w'aves  of  different 
innsic  frequencies  entraining  with  all  coupling  coefficients 
:  :o  -0.3.  Then,  one  of  the  agents  stops  "listening"  to 
'.mmates,  thus  becomes  the  leader.  The  entrained  frequency 
different  (higher)  than  any  of  the  original  frequencies.  Note 
0  chat  entrainment  requires  the  original  frequencies  to  be 
isonably  similar  to  one  another/  although  higher  harmonic 
trainment  is  possible  [10], 

Entrainment  to  a  common  frequency  is  used  as  the  synchro- 
:ation  technique.  When  entrainment  is  detected,  the  common 
ive  may  be  used  as  a  clock  so  that  ensuing  actions  might 
performed  simultaneously  thereafter.  If  we  need  to  elect 
leader  that  the  others  should  follow  (for  example,  when 
rectionai  heading  must  be  arbitrated),  then  that  leader  can 
nply  change  its  own  coupling  coefficients  to  zero,  so  that 
osCiiLces  ic  its  original  frequency,  while  the  others  remain 
t.-ained.  Tnis  results  in  a  set  of  waves  that  are  entrained  to 

-  ’ '2  JO  30^  csvtacton  from  ths  .msdioft  frstquency  is  pcnnissible  in  order 
ootam  ir.cnirimenc 


601 


Oscillator  Entrainment 


Fig.  3.  Heanbeat  e.acrainrr.enc  :o  a  common  rraq'cer.cy;  :uil  coupiing.  and 
iben.  leader  followins.  Note  d^e  cnange  in  arr.piicjce  aae  to  decoupling. 

the  leader,  as  opposed  to  being  mutually  entrained;  however, 
their  phases  may  differ. 

III.  COORDLVATION  SCHEME 

We  demonstrate  multi-agent  coordination  by  realizing  three 
heartbeat  circuits  and  obser-'ing  the  entrainment  behavior  over 
a  wide  range  of  coupiing  factors.  To  achieve  wireless  coupling 
of  the  oscillators,  we  use  an  inexpensive  FM  communication 
link.  While  our  results  refiecc  some  phase  distortion  in  the 
received  heanbeat  signals,  we  show  chat  the  FM  coupling  does 
not  adversely  affect  the  performance  goals  of  our  system. 

A.  Oscillator  Realizcition 

The  oscillator  shown  in  Fig.  2  represents  a  heartbeat  circuit, 
of  which  each  agent  has  one.  The  input  signals  Xi  through 
Xn  represent  the  heartbeats  of  all  other  agents  whose  signals 
are  within  "heanng"  range.  The  received  signals  are  coupled 
through  a  summing  Junction  and  added  to  the  oscillator  5 
feedback  path. 

For  our  research  we  use  a  three  oscillator  model,  in  which 
each  heanbeat  couples  to  two  other  heanbeat  signals  (Fig.  4). 
The  heanbeat  circuits  are  identical  except  that  each  is  assigned 
a  unique  value  for  the  frequency  parameter  w-.  For  sim.plicity, 
we  choose  m  as  unity.  Wlien  ^  is  large  it  is  extremely  difficult 
to  predict  the  regions  of  entrainment  for  a  coupled  oscillator 
network.  It  has  been  shown  theoretically  that  for  small 
we  can  predict  the  regions  of  frequency  entrainment  for 
a  van  der  Pol  oscillator  excited  by  a  driving  frequency. 
Simply  stated,  if  the  frequencies  are  not  too  different  harmonic 
entrainment  will  occur.  In  the  case  of  harmonic  entrainment 
an  oscillator  synchronizes  to  the  driving  frequency.  If  the 
frequency  separation  is  large,  but  the  ratio  of  the  frequencies 
is  in  the  neighborhood  of  an  integer  or  a  fraction,  frequency 
entrainment  may  still  occur.  The  latter  type  of  entrainment 
is  called  higher-harmonic  or  subharmonic  entrainment  since 
the  oscillator  entrains  to  a  frequency  that  is  a  multiple  or 
submultiple  of  the  driving  frequency  [10].  When  fj.  is  made 
large  the  machemadcal  analysis  of  the  van  der  Pol  equation 
becomes  very  complex,  and  we  are  unable  to  predict  the 
regions  of  entrainment.  However,  it  has  been  shown  through 
analog  simuladons  chat  two  mutually  coupled  van  der  Pol 
oscillators  exhibit  stable  limit  cycle  oscilladon  for  values  of  m 
ranging  from  0.1  to  l.O  [14].  Funhermore,  our  own  simulations 
have  indicated  that  frequency  entrainment  occurs  for  large 
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Fig.  (a/  Full  and  I'b)  nng  couplings  or  :hree  oscillators. 


numbers  of  mucuaily  coupled,  van  der  Pol  oscillacors  when 
^  =  I.O. 

Wliile  the  oscillator  realization  is  almost  a  straightfor.vard 
adaptation  of  the  block  diagram,  a  direct  synthesis  of  the 
block  diagram  iji  Fig-  2  yields  a-ver>*  low  frequency  oscillator 
with  a  limited  frequency  range.  Thus,  we  time  scaled  all  the 
oscillators  by  the  same  factor  (xlOOO)  and  used  jj-  to  tune 
each  oscillator  to  a  unique  frequency. 

B.  Wireless  Coupling 

In  choosing  a  communications  medium  to  transmit  agent 
heartbeats,  several  criteria  were  considered.  The  coordination 
schem^e  requires  a  communication  system  that  is  inexpensive, 
reliable,  suitable  for  indoor  or  outdoor  operation,  and  has  rea¬ 
sonable  range.  There  are  several  options  for  low  cost  wireless 
data  links.  For  short  range  communication  links,  infrared  and 
ultrasound  are  two  popular  media,  but  to  synchronize  a  large 
sw'arm  of  agents  spread  out  over  a  large  area  w'e  need  more 
range  than  these  media  offer.  The  primary  disadvantage,  how¬ 
ever,  in  usina  infrared  or  ultrasonic  communication  links  is  that 
their  directional  transmission  beams  cannot  provide  reliable 
coupling.  Object  interference  or  an  inability  to  maintain  a  line 
of  sight  are  likely  to  prevent  two  agents  from  keeping  their 
heartbeats  locked.  Thus,  an  omnidirectional  communication 
link,  such  as  an  R5  link,  is  most  desirable  for  synchronizing 
a  group  of  agents  through  mutual  coupling. 

In  keeping  with  the  A\rmy  Ant  specifications,  a  simple 
and  low  cost  RP  communications  link  is  used  to  couple  the 
oscillators.  Oscillator  signals  are  broadcast  using  miniature 
tunable  FM  transmitters.  Portable  digitally  tuned  FM  radios 
are  used  as  receivers. 

The  ranse  of  our  broadcast  system  is  dependent  upon  the 
type  of  antenna  used  with  the  F2vl  transmitters.  The  transmitters 
have  a  ranse  of  up  to  1/3  mi  with  a  12-in  wire  antenna, 
but  that  range  may  be  extended  as  much  as  I  mi  with 
a  more  sophisticated  antenna.  The  system  can  be  operated 
continuously  for  approximately  two  days  on  a  9  V  battery. 

C.  Computer  Control  of  Oscillctor  Parameters 

To  maintain  autonomy  in  our  .Axmy  Ant  agents,  we  rnusc 
give  them  the  capability  to  control  their  own  oscillator  pa¬ 


rameters,  Some  aspects  of  swarm  coordination  require  more 
than  Just  synchronization  at  a  fixed  frequency.  Tne  speed 
controller  to  be  discussed  in  Section  IV  relies  on  changes  in 
coupling  strength  to  effect  global  changes  in  the  speed  at 
which  agents  are  commanded  to  travel.  Thus,  agents  shall 
be  capable  of  changing  their  coupling  strength,  as  well  as 
coupling  or  decoupling  themselves  from  the  heanbeat  and 
varying  their  frequency  parameters.  We  assume  that  each 
agent  can  sense  when  to  couple  or  decouple  its  oscillacors. 
One  obvious  situation  in  which  an  agent  would  voluntarily 
decouple  itself  from  the  network  is  when  it  detects  that  its 
''health”  is  failing.  For  example,  an  agent’s  battery  voltage 
can  easily  be  monitored,  so  when  the  battery  discharges  to 
the  point  chat  it  can  no  longer  support  a  critical  load  such 
as  the  drive  motors,  the  agent  realizes  its  futility  and  takes 
appropriate  action.  Under  normal  circumstances,  agents  are 
not  permitted  to  change  their  pre-assigned  frequencies.  Yet  in 
the  case  that  an  agent  detects  a  failure  mode  it  can  decrease  its 
frequency  to  avoid  any  possibility  of  being  elected  a  leader. 

To  provide  automatic  parameter  control  we  use  a  voltage 
controlled  amplifier  incerraced  to  an  up/down  counter.  Agents 
increase  or  decrease  their  parameters  by  either  counter  up  or 
down,  W'hen  an  agent  wants  to  decouple  from  the  network  it 
counts  down  completely,  producing  a  control  voltage  of  zero 
volts:  consequently,  incoming  heartbeats  are  ‘’coupled”  with  a 
gain  of  zero  (i.e.,  =  0  for  j  =  i). 

D.  Experimental  Results 

We  simulated  three  uncoupled  oscillacors  with  frequency 
parameters -a; j  =  1.3,  and  a^'5  =  1.4.  Using  the  same 

settings,  we  acquired  data  from  the  hardware  implemented 
oscillators  and  compared  the  results.  Table  I  shows  the  oper¬ 
ating  frequencies  for  the  simulated  and  acnial  oscillacors.  Tne 
operating  frequencies  of  oscillator  I  and  oscillator  2  matched 
the  simulation  within  2,59c,  Oscillator  3  differed  by  7.59o  from 
the  simulation.  These  errors  could  be  made  extremely  low 
if  precision  resistors  are  used  in  ’ouilding  the  oscillacors.  and 
gains  are  carefully  trimmed  to  the  exact  values.  Keeping  m 
mind  that  a  time  scaling  factor  of  1000  was  used,  one  sees 
that  the  frequency  parameter  roughly  determines  the  oscillator 
frequency  as  expected. 
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TABLE  I 

Oscillator  Freqcencies  for  the  Slmil.mhd 
AND  Actcal  Three  Oscillator  Net-vcrr 


Oscuiator 

1 

Oscillator 

2 

Oscillator 

3 

Frcq.  ?inime:er 

o,-=  1.2 

cji-  -  1.3 

03-,-  =*  1.4 

Simuhcfid  (Hz) 

1  1245 

1  1301 

j  1352 

.A era  3Li  (Hz)  j 

1  1272 

1  1329 

!  1454 

Oscillator  1 


1  - - - 

0.1  0.3  0.5  0.7  1 


coupling  factor 

5.  Racio  of  encrained  frequency  co  natural  frequency  versus  coupling 
or  for  the  simulated  (a)  and  actual  (■) 'oscillator  1{— “  =  1.2). 


Amplitude  vs.  Coupling  Factor 


0.1  0.3  0.5  0.7  1 

couplirg  factor 


6.  Amoiitude  versus  coupling  factor  for  simulated  (a)  and  acnial  (■) 
.sccors. 

>Ve  refer  to  the  frequencies,  l'v,  in  Table  I  as  the  natural 
Tuencies.  In  the  following  e.xperiments  we  use  the  ratio 
ahe  entrained  frequency  co  the  natural  frequency  as  a 
IS  for  comparing  actual  and  simulated  results.  Fine  we 
npare  the  results  of  our  oscillator  network,  with  coupling 
icted  by  hardwired  connections,  to  the  simulated  network, 
measured  the  entrained  frequency  and  the  signal  amplitude 
coupling  factors  ranging  from  A  =  0.1  to  A  =  1.0.  In  each 
e  ail  oscillators  were  set  to  the  same  coupling  factor.  In 
.  5.  which  shows  the  entrained  frequency  versus  coupling 
tor.  we  see  that  the  performance  of  the  actual  oscUlacor 
matches  the  simulated  results  extremely  well,  with  the 
iition  increasing  slightly  with  increasing  w'.  Comparable 
;uics  were  obtained  for  oscillators  2  and  3.  The  amplitude 
^  :.he  entrained  frequency  versus  coupling  factor  is  shown 
:  r\z.  0.  Two  trends  are  evident  from  our  data:  the  ratio 
Ljn  trained  frequency  to  natural  frequency  increases  almost 
;ariy  as  the  coupling  factor  increases,  while  the  amplitude 
meases  with  increasing  coupling  strength. 
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Oscillator  1 


0.1  0.3  0.5  0,7 

coupling  factor 


Fig.  7.  Ratio  of  e.ntrained  frequency  -o  natural  frequency  versus  coupling 
factor  for  the  simulated  (a)  ana  actual  (a)  oscillator  I  usins  F>T  couotine 
=  1.2).  ■  '  ' 


Amplitude  vs.  Coupling  Factor 


0.1  0.3  0.5  0.7  1 

coupling  factor 

Fia.  S.  .Amplitude  versus  coupling  factor  for  simulated  (a)  and  actual  (B) 
oscillators  with  FM. 

Figs.  7  and  8  show  the  results  for  oscillator  I  when  the  same 
experiments  are  repeated,  but  we  use  an  F^[  link  to  coupie 
the  oscillators.  .Again,  oscillators  2  and  3  behaved  similarly 
to  oscillator  1.  With  FM  coupling  the  oscillators  entrained  to 
much  lower  frequencies  than  when  a  hardwired  connection 
was  used.  Additionally,  the  increase  in  coupling  strength  did 
not  have  as  much  effect  on  the  oscillator  amplitudes  in  the 
FM  coupled  network.  There  is  significant  error  between  the 
performance  of  simulated  oscillators  and  the  FM  coupled 
oscillators,  but  more  imponantly,  the  trends  are  still  similar. 
Just  as  imponant  is  chat  the  behavior  of  the  FM  system 
is  consistent.  The  relationship  between  frequency  ratio  and 
coupling  factor  is  slightly  more  linear  for  the  simulated 
oscillators.  The  frequency  ratio  verses  coupling  factor  data 
for  the  F^I  coupled  oscillators  looks  similar  to  the  simulated 
results,  but  rises  with  a  smaller  slope.  The  error  is  greatest 
at  the  highest  coupling  factor.  The  amplitude  data  starts  out 
with  simulated  and  FM  coupled  oscillators  matching  very  well 
at  low  coupling,  has  the  greatest  error  between  A  =  0.3  and 
A  =  0.5,  and  shows  little  error  at  higher  coupling  factors. 

A  large  portion  of  the  error  in  the  FM  data  can  be  attributed 
to  the  manner  in  which  the  data  was  acquired,  rather  chan  to 
the  FM  system.  When  the  heanbeat  circuits  were  connected  to 
the  data  acquisition  board,  the  transmitters  were  as  close  as  1 
foot  from  each  other.  Unlike  most  miniature  transmitters, 
which  have  a  5  mW  output  power,  our  transmitters  have  a  75 
mW  power  output.  While  the  higher  output  power  affords  a 
greater  broadcast  range,  at  close  range  there  is  considerable 
interference  benveen  devices.  Distonion  is  noticeable  in  the 
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HarCiVired,  ccupling=.1.  w1  =  1.2,  w3=1.4 


0.232  0.233  0.234  0.235  0.236  0.237  0.233  0.239  0.24  0.241 


time  (sec)  . 

Fig.  9.  Hardwired  coupled  osciilactjr  '^necwork  encrained  :o  1472  Hz 
:  A  =  0.1).  Oscillatocs  begin  to  entrain  at  0.236  s. 

received  heanbeac  signals,  so  we  expecc  the  behavior  of  che 
van  der  Pol  oscillacors  to  be  affected.  In  a  scenario  w'here 
agents  are  randomly  distributed  rather  than  clustered  together, 
■here  is  less  interference  and  che  behavior  of  the  FM  coupled 
system  is  likely  to  improve. 

Also,  che  presence  of  che  radio  link  can  be  modeled  as  a 
complex  system  block  in  che  oscillator  diagram  of  Fig.  4.  If 
we  neglect  this  block,  we  are  neglecting  a  signincanc  portion 
of  che  overall  dynamics  and  yet  w'e  still  expecc  the  oscillators 
:o  entrain  as  theoretically  predicted. 

Whiie  amplitude  distortion  is  a  problem  in  AM  systems.  FM 
systems  are  plagued  by  phase  distortion.  The  phase  distortion 
results  when  the  phase  relationship  between  the  carrier  and 
•he  sidebands  are  altered  [II].  Phase  equalization  networks 
can  be  used  to  correct  this  problem.  Of  course  a  price  is  paid 
for  higher  quality  signal  processing.  One  disadvantage  of  using 
inexpensive  off-the-shelf  communications  hardware  is  that  cir¬ 
cuit  schematics  and  specifications  are  often  unavailable,  and  it 
is  difficult  to  know  exactly  what  type  of  performance  to  expect. 
Nevertheless,  our  broadcast-based  coordination  scheme  does 
not  demand  a  high  quality  FM  link.  We  are  more  concerned 
that  the  oscillators  stay  entrained  and  that  the  network  behaves 
accordingly  to  changes  in  coupling  factor,  than  we  are  with 
the  specific  entrainment  frequency. 

Figs.  9  and  10  show  a  three  oscillator  system  entrained 
to  a  common  frequency  for  a  coupling  factor  of  O.l  using 
both  hardwired  and  FM  coupling.  In  the  hardwired  case  we 
captured  the  data  as  the  oscillators  moved  from  free  oscillation 
:o  full  entrainment-  We  see  that  there  is  a  brief  transient  that 
occurs  between  the  time  when  the  oscillators  first  couple  until 
they  settle  to  the  entrained  frequency.  The  transient  was  coo 
’.ong  in  che  FM  case  to  display  che  transition  in  one  frame. 
Thus,  the  graph  shows  the  oscillators  after  they  are  fully 
entrained.  Fig.  1 1  shows  the  behavior  of  the  network  in  a 
leader-follower  configuration.  We  elect  oscillator  2  as  leader 
and  see  that  the  leader  does  not  change  its  behavior,  but 


FM,  coucling=,1,  ^1  =  1.2,  sv2=1.3,  'M3=1.4 


0.17  0.171  0.172  0.173  0.174  0.175  0.173  0.177  0.178 

time  (sec) 


Fig.  10.  FAl  coupled  osciliucor  network  entrained  to  1^13  Hz  (A  =  0.1). 


LeaCer-Follower,  Leader  Osc  2,ccupling=.3 


0.252  0.253  0.254  0.255  0.253  0.257  0.253  0.259  0.2S 

time  (sec) 


Fig.  II.  Leader-Foilower  conriguration  with  oscillator  2  rdashed  line)  as 
leader  I  A  =  0.3).  Oscillators  entrain  in  rrequency  but  not  in  phase. 

the  followers  entrain  to  the  leader’s  frequency.  Unlike  in  a 
mutually  coupled  configuration,  when  oscillacors  entrain  to  a 
leader  their  entrained  frequencies  are  not  in  phase. 

IV.  Global  Speed  Control 

Our  broadcast-based  coordinadon  scheme  will  be  applied,  in 
many  cases,  to  simultaneously  initiate  a  common  action  within 
a  swarm  of  Army  Ants.  Used  in  this  manner,  che  entrained 
frequency  serves  only  as  a  trigger  as  long  as  ail  the  agents' 
heanbeats  synchronize,  their  coupling  strength  is  irrelevant. 
Tne  global  speed  controller  requires  two  properties  of  mutually 
coupled  van  der  Poi  oscillators.  To  control  agents'  speed  we 
require  synchronizadon  ’oehavior  so  that  all  asencs  travel  at 
che  same  velocicy.  Also,  we  employ  a  property  presented  in 
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ur  results  from  Section  III:  the  entrained  frequency  increases 
.most  linearly  with  increasing  coupling  factor.  Thus,  '.ve  etrect 
:lobai  change  in  speed  by  changing  the  strength  at  wn-ch 
eanbeats  couple. 

Tne  global  speed  controller  application  was  chosen  to 
emonstrate  that  our  nnechod  could  be  used  to  tngger  other 
ehaviors.  Bv  detecting  thresholds  in  dynamically  entrained 
ariables.  practically  any  behavior  can  be  symbolically  tng- 
ered.  Sirniiariy,  the  same  approach  can  be  used  to  control 
lanv  behaviors  or  decisions.  This  also  shows  how  a  local 
ndividual)  decision  can  affect  a  global  change. 

[i\  this  section,  'we  e.xpiain  the  importance  of  global  speed 
oncrol  to  the  Army  Ant  scenario.  Tnen.  using  the  three 
eartbeat  circuits  we  show  how  to  implement  the  global 
need  control.  We  discuss  the  additional  hardware  required 
}  interface  the  heartbeat  circuit  to  motor  controller  chips  and 
j  the  Army  Ant’s  processor,  the  68HCII  microcontroller  [2]. 
.ascly.  with  one  heartbeat  integrated  into  an  army  ant  we  use 
ie  elobal  dvnamic  created  by  the  three  heartbeats  to  control 
''.e  robot's  speed.  Tne  performance  of  the  speed  controller  is 
valuaced  for  several  different  coupling  factors. 

One  situation  where  it  is  desirable  to  have  all  agents  moving 
c  the  same  velocity  is  when  agents  must  cooperatively  carry 
pallet.  .Assuming  that  agents  beneath  a  pallet  can  align 
nemselves  along  the  direction  of  motion  (see  [IS]),  we  would 
.ke  them  to  travel  at  a  common  speed  while  carrying  the 
aad.  A  pallet  only  rests  on  top  of  a  group  of  agents:  it  is  not 
isidly  attached.  In  a  group  of  agents  that  are  not  traveling  at 
uniform  speed,  slower  moving  agents  may  eventually  lose 
ontact  with  the  pallet  as  faster  agents  carry'  it  away. 

While  not  as  critical  as  the  pallet  carry'ing  example,  global 
peed  control  is  useful  for  some  ground  maneuvers.  Suppose 
,  erouo  of  agents  must  search  a  large  area.  Agents  traveling  at 
.ne  same  speed  will  cover  roughly  the  same  amount  of  ground 
.nd  expend  their  resources  at  the  same  rate.  The  intention  of 
-e  .Army  Ant  swarm  is  to  accomplish  goals  cooperatively, 
ich  no  one  agent  assuming  a  greater  role  chan  another.  By 
tcercising  speed  control  we  can  better  maintain  homogeneity 
'  i  the  level  of  activity  of  the  swarm. 

As  we  have  previously  discussed,  an  important  characteristic 
'  f  our  coordination  scheme  is  that  it  reacts  globally  to  local 
hanges.  If  for  some  reason  one  agent  in  the  heartbeat  networK 
eases  chat  it  must  decrease  its  speed  it  will  lower  its  coupling 
trensth.  causing  the  entire  heartbeat  dynamic  to  oscillate  at  a 
lower  frequency.  Tne  global  reaction  of  the  network  implies 
hat  if  a  sroup  of  agents  is  carrying  a  pallet  toward  an  obstacle, 
)nly  a  few  agents  in  front  need  to  detect  the  danger  in  order  to 
dude  disaster.  As  soon  as  the  few  agents  begin  to  slow,  their 
eaction  will  be  sensed  by  the  heartbeat  dynamic  and  every 
nember  of  the  group  will  decrease  its  speed  accordingly. 


A.  Hardware  Implementaiion 

'To  -ealize  our  global  speed  controller  we  need  a  way  to 
ransfer  information  from  the  heanbeats  to  the  motor  con- 
.rollers.  We  show  chat  this  is  accomplished  very  easily  using 
X  rreGuencv-co-vouasie  converter.  .Also,  we  w'anc  the  speed 
.0  be  affected  only  when  the  heanbeats  are  coupled.  When 


heartbeats  are  oscillating  freely  aaencs  are  not  considered  to 
be  part  of  a  coilecti're  and  do  not  need  to  be  coordinated. 

/ 1  Heartbea:  io  Mocor  CorjroHe^'  [}iieracc:  The  nrsc  stage 
of  our  interiace  is  a  frequency-ro-\oit:.ge  circuit.  .A 

frequencv-to-voitage  converter  outputs  an  a.aaiog  voltage  tout 
is  proportional  to  the  frequency  of  the  input  signal.  The 
eain.  specined  in  volts  per  Hz.  is  user  derined  oy  a  rev.' 
external  resistors  and  capacitors.  Our  motor  controller  chips 
require  a  digital  velocity  command.  Therefore,  we  need  to 
insert  an  analog-co-digital  converter  between  the  output  or  the 
i-onverrer  and  the  motor  controller.  The  motor  controller 


outputs  a  pulse  width  modulated  signal  to  an  H-bndge  circuit, 
which  drives  the  motors.  The  68HCI 1  microcontroller  includes 
an  analo2-to-dlgital  convener;  so  with  only  one  additional 
integrated  circuit  (the  FW  converter)  we  can  interface  a 
heartbeat  circuit  to  an  army  anc’s  motor  control  hardware.  The 
interface  provides  a  velocity  command  to  the  motor  controller 
that  varies  linearly  with  the  heartbeat  frequency  of  oscillation. 

The  question  arises:  How  do  we  keep  the  frequency  of 
uncoupled  oscillators  from  influencing  agents'  speed?  The  key 
lies  in  the  fact  chat,  in  a  mutually  coupled  system  of  van 
der  Pol  oscillators,  the  entrained  frequency  will  always  be 
sreacer  than  any  of  the  original  free  frequencies  regardless 
of  the  coupling  factor.  Consequently,  we  can  sec  a  frequency 
’•trip  point”  that  is  slightly  greater  chan  any  free  frequency  in 
a  network.  Assuming  that  the  frequency  separation  between 
any  two  robot  heanbeats  is  small,  the  crip  point  can  be  fixed. 
Our  motor  control  algorithm  issues  a  “do  nothing  command 
when  voltages  at  or  below  the  voltage  corresponding  to  the 
frequency  crip  point  are  received  from  the  FW  convener.  An 
output  from  the  F/V  converter  chat  is  above  the  crip  point 
indicates  chat  the  system  has  coupled.  Only  then  does  the 
velocity  command  to  the  motor  change  proponionally  with 
the  heanbeac  frequency  of  oscillation. 

Each  army  ant  shall  have  the  descnbed  interface  between  its 
heanbeac  circuit  and  its  motor  control  electronics  in  an  actual 
scenario.  In  this  expenmenc  we  use  three  mutually  coupled 
heanbeats  with  only  one  heanbeac  integrated  to  an  army  ant. 
Tnus.  we  acquire  velocity  data  on  only  one  agent.  We  have 
already  shown  that  the  heartbeats  will  entrain  to  a  common 
frequency.  Given  chat  the  trip  points  and  frequency-to- voltage 
sains  are  the  same  in  every  interlace,  we  can  assume  chat  the 
global  heanbeac  dynamic  effects  the  same  speed  command  in 
all  agents. 


B.  Experimental  Results 

The  control  of  the  Army  Anc  drive  motors  for  this  ex¬ 
periment  is  open  loop  with  respect  to  velocity,  i.e.,  we  do 
not  utilize  tachometers  to  provide  feedback.  Therefore,  only 
commanded  signals  are  at  our  disposal.  We  choose  to  acquire 
the  motor  velocity  command  data  to  evaluate  our  global 
speed  controller  rather  than  the  signals  to  the  motors.  Tne 
motor  drive  signals  are  pulse  width  modulated,  and  not  as 
convenient  a  format.  In  the  first  set  of  experiments  we  use 
hardwired  coupling,  and  observe  the  commanded  speed  to  the 
asenc’s  drive  motors  as  its  heanbeac  transitions  from  rreeiy 
oscillating  to  fully  entrained.  We  repeat  the  experiment  tor 
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TABLE  II 

Predicted  AND  .Actcal  Speed  Data  for  H.ard'-vtp^d  Colpled  System 


coupling 

factor 

encniined 

fircquency 

(Hz) 

predictsd 

speed 

(rpm) 

acmai 

speed 

from) 

0.3 

!  1724 

1  10.50 

1  10.45  i 

0.5 

1  1950 

1  11.38 

1  11.32  1 

0.7  i 

I  2136 

1  13,00 

1  12.99  i 

T.A^LE  in 

Speed  Data  from  FNI  Cocpled  Svste.m 


coupling 

entrained  i 

enoninea 

factor 

Drequency  j 

[jequency 

’n:2r:r^ired  1 

F.\{ 

F'.i 

1  0.3  1 

!  1507  i 

1674 

iO.  ‘ 

1  0.5  1 

!  1590  1 

1344 

11.23 

I  0.7  1 

1828  I 

1964 

11.91  : 

ihree  dir'ferenc  coupling  factors:  A  =  0.3.  A  =  0.5.  and 
A  =  0.7.  As  e.xpecced,  the  speed  command  increases  wuh 
increasing  coupling  strength. 

Given  the  entrained  frequency  data  collected  in  the  last 
section,  and  the  conversion  gains  used  in  the  speed  controller, 
we  can  predict  the  speed  commands  for  each  coupling  factor.^ 
Note  that  predicting  the  speed  at  which  coupled  agents  will 
travel  is  not  a  goal  of  the  global  speed  control  application.  The 
object  of  the  speed  controller  is  to  ensure  that  all  agents  move 
at  a  common  speed,  and  that  any  change  in  an  individual's 
speed  will  be  redecced  in  evei^^ agent  whose  heanbeac  is  part 
of  the  dynamic.  Tne  entrained  frequency  is  dependent  upon 
rhe  number  of  agents  in  the  network.  For  large  systems  it  may 
be  inconvenient  to  predict  the  speed,  and  impossible  if  the 
number  of  agents  is  unknown.  Besides,  the  Army  .Ant  concept 
does  not  support  deterministic,  centrally  computed  knowledge. 
Yet  comparing  the  predicted  speed  to  the  data  we  acquired 
provides  a  useful  check  to  validate  our  controller. 

Table  II  shows  the  predicted  speeds  for  the  three  coupling 
factors  based  upon  the  entrained  frequencies  acquired  in 
Section  III,  as  w-ell  as  the  speed  data  collected  in  these 
experiments.  The  fact  that  the  predicted  data  matches  the 
actual  data  so  closely  indicates  that  the  frequency-to- voltage 
conversion  remains  linear  over  a  wide  frequency  range. 

Tne  same  experiments  were  conducted  using  FNI  coupling. 
While  we  can  expect  the  entrained  frequency  to  remain  con¬ 
stant  for  i:  given  coupling  factor  and  a  sec  number  of  agents 
when  hardwired  coupling  is  used,  this  is  not  the  case  when 
heartbeats  are  coupled  through  FM.  Just  as  we  experience 
degrees  in  the  quality  of  reception  on  our  FM  radios,  the 
performance  of  our  broadcast-based  system  is  subject  to  vary 
somewhat  with  e.xtemai  interference.  Therefore,  it  is  not  very 
meaningful  to  compare  the  predicted  speed  to  the  actual  speed 
in  the  FSi  case.  We  have  already  validated  the  speed  controller 
with  our  hardwired  data,  so  we  present  the  FM  data  in  a 
slightly  different  manner.  Using  the  speed  data  acquired  with 
FM  coupling,  we  can  work  “backward"’  and  estimate  the 
entrained  frequencies.  The  entrained  frequencies  from  these 
experiments,  when  compared  to  the  data  from  the  hardwired 
coupled  system,  provide  on  example  of  how  the  performance 
of  the  FM  system  can  deviate.  Table  m  summarizes  the  data 
for  three  different  coupling  factors. 

The  error  between  the  acquired  speeds  for  the  hardwired 
and  cases  is  under  5%  for  A  =  0.3  and  A  =  0.5  and 
under  109^  for  A  =  0.7.  The  A/D  oucpuL  which  provides  the 
speed  command  to  the  motor  controller  chip,  was  acquired  and 

'The  inraned  frecuencv  divided  by  the  F/V  gain  gives  an  analog  voitage. 
This  voitase.  when  muitipiicd  by  the  maximum  rpm  (25  rpm;  and  divided  by 
:r.e  maximum  A/D  voltage  (5  V).  provides  the  expected  speed. 


harcAired,  ccupiing=.5 
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Fig.  12.  Global  speed  for  a  hardwired  oscillator  network  with  a  coupling 
strength  of  A  =  Q.5. 

converted  to  an  rpm  value  in  our  software  routine.  Figs.  12 
and  13  show  plots  of  the  commanded  speed  for  cases  in 
which  the  heartbeats  were  coupled  via  hardwired  connections 
and  FM.  At  an  arbitrary  time  during  the  data  acquisition  the 
oscillator  receivers  were  turned  on.  The  subsequent  step  in  the 
speed  occurs  as  the  system  transitions  from  an  uncoupled  to 
a  coupled  system.  The  level  of  the  commanded  speed  of  the 
coupled  system  is  dependent  upon  the  coupling  factor  used. 
Notice  chat  the  graphed  speeds  are  all  integer  values.  Ail 
velocity  commands  were  truncated  by  the  algorithm  during 
implementation.  The  velocity  data  shown  in  Tables  II  and  III 
was  obtained  through  examination  of  the  raw  output  from  the 
.A/D. 

We  have  shown  that  the  heanbeat  dynamic  can  easily 
be  applied  as  a  global  speed  controller.  To  interface  the 
heanbeat  circuit  to  an  agent’s  motor  control  hardware,  only 
one  additional  IC  is  required.  The  essence  of  the  speed  con¬ 
troller  interface,  the  frequency-to-volcage  convener,  exhibited 
a  highly  linear  response  over  a  wide  range  of  input  frequencies. 

The  performance  of  the  speed  controller  compared  favorably 
to  predicted  results.  We  showed  that  changes  in  coupling 
strength  produce  the  same  directional  trend  in  the  commanded 
velocity  to  an  agent's  drive  motors.  Additionally,  we  have 
verined  that  the  error  introduced  by  coupling  the  system 
chrough  an  FM  communication  link  is  at  an  acceptable  level. 
In  short,  our  broadcast-based  coordination  scheme  was  suc¬ 
cessfully  and  simply  applied  to  one  of  the  more  complex 


507 


•t-.N  E’.ANS  ?>EaCTIVH  COORDINATION  SCHEME 
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j.  Global  speed  tor  an  FM  coupled  oscillator  network  with  a  coupling 
in  Of  A  =  6. 3. 

viors  reauiring  coordinacion.  The  results  attained  in  our 
d  control  experiments  serve  to  reinforce  our  confidence 
:he  heartbeat  concept  is  a  viable  method  tor  coordinating 

V  actions  within  the  Army  Ant  swarm. 

V.  Other  applications/Ushs 

re  applications  of  coupled  nonlinear  oscillators  in  the 

V  Ant  scenario  are  not  limited  to  leader  selection  and 
nronization.  In  this  section  we  explain  the  use  of  similar 
.lacor  schemes  for  error  detection. 

rror  Detection  in  Distributed  Load  Bearing 

j  lift  and  carrv'  a  palletized  load.  Army  Ant  robots  need 
c:  together  to  establish  a  stable  transportation  mechanism. 
:ethod  for  controlling  the  collective  behavior  of  Army  Ant 
cs  for  load  transpoaation  is  described  in  [IS].  It  is  achieved 
>eiec:in2  a  leader,  without  direct  communications,  using 
force  sensors  mounted  at  the  point  ot  contact  with  the 
’t.  Tne  method  described  in  [IS]  results  in  stable  behavior 
horizontal  movement  of  agents  under  the  pallet.  The 
lod  described  here  differs  in  the  type  of-communication 
i;  in  [IS],  coupling  is  a  physical  force  transmitted  by  the 
oad:  here,  we  use  a  broadcast  signal  which  can  be  used 
;  no  physical  contact  at  ail. 

unng  lifting  and  transportation  of  pallets  it  may  be  impor- 
that  the  pallet  be  supported  equally  (to  some  degree)  by 
ooocs.  It  is  possible  to  detect  this  condition  by  comparing 
iensed  vertical  force  on  agents.  However,  since  there  is  no 
-.rai  controller,  this  has  to  be  done  collectively.  Assuming 
:ne  robots  under  the  pallet  can  receive  signals  from  their 
n mates,  they  can  form  a  ring  or  fully  coupled  network 
ire  the  feedback  gains  of  the  oscillator  are  linear  functions 
r.e  output  of  vertical  force  sensors  (Fig.  4). 

"re  ladition  of  a  variable  feedback  gain  to  the  oscillator 
:ei  enables  the  robots  to  detect  whether  the  pallet  is  level 


and/or  the  load  distribution  is  even.  Feedback  gain  factor  Aii 
ahanses  over  time  as  a  function  of  the  output  of  the  force 
sensor.  Therefore,  the  feedback  gain  of  the  each  oscillator 
changes  according  to  the  vertical  force  compone.nt  sensed  by 
the  robot. 

Usin^  the  incoming  signals  from  ocher  robots  and  force 
sensor  output,  robots  can  determine  the  difference  e.  between 
their  own  sensor  and  the  sensor  of  the  “previous'*  robot  (or  the 
average  of  all  incoming  signal  in  the  full  coupling  case).  In 
ocher  words,  the  signal  e:  is  a  relative  measure  of  discrepancy 
in  force  sensor  outputs.  Robots  that  detect  a  difference  beyond 
the  predefined  parameters  can  broadcast  a  warning  signal. 
Transportation  scans  and  continues  as  long  as  there  is  no  robot 
broadcasting  a  warning  signal  caused  by  this  or  any  ocher 
problem.” 

The  error  signal:  ^  Aj^a.*; 

(In  ring  coupling.  7=^-1  only). 

In  the  following,  the  oscillators  use  ring  coupling  and  all 
coupling  coefnciencs  are  set  to  -0.3.  The  feedback  gains  A.-y 
vary  according  to  the  linear  mapping^  from  internal  sensor  to 
2ain.  When  all  gains  are  approximately  equal,  entrainment  is 
reached  after  a  time  with  frequencies  in  the  range  0.8- 1.2. 
For  our  purpose,  we  sec  all  oscillator  frequencies  w"  to  1. 
How'ever.  the  amplitude  of  the  oscillations  will  differ  from 
a2enc  to  agent  if  the  feedback  gains  are  not  the  same.  Each 
agent  is  able  to  compare  its  output  with  the  output  signal  of  the 
coupled  teammate.  In  addition,  the  ring  connguracion  enables 
each  robot  to  detect  the  sensor  output  variations  occurring  in 
any  other  teammate  by  comparing  its  sensor  output  only  to  the 
previous  robot  in  the  conhguradon.  Because  all  such  compar¬ 
isons  accumulate  as  coupling  signals  pass  along  the  network, 
the  errors  in  feedback  gains  (force  sensors)  can  be  detected 
by  other  oscillators,  even  if  they  are  not  nearby.  The  robots 
may  adjust  their  oscillator  feedback  gains  by  exerting  more 
of  less  force  or  by  moving  to  more  desirable  spots— -thereby 
equalizing  vertical  forces  until  oscillators  agree. 

In  full  coupling,  the  entrainment  is  much  faster  than  in  the 
previous  case  because  of  the  multiple  effects  on  each  agent. 
The  coupling  coefncient  is  chosen  smaller  (~0.i)  to  reduce  the 
stren2th  of  incoming  signals.  The  error  signal  is  much  smaller 
in  ma2nicude  for  a  fully  coupled  netw’ork.  For  this  reason, 
in  error  detection  applications,  a  ring  coupled  network  may 
be  preferred.  The  difference  between  full  and  ring  coupling 
methods  are  summarized  in  Table  IV. 

It  may  be  necessary  to  adjust  the  allowable  ranges  for 
error  signals  so  chat  problems  may  be  detected  regardless  of 
the  number  of  agents.  As  long  as  the  number  of  agents  is 
greater  than  four,  the  same  parameters  for  allowable  ranges 
are  likely  to  work  for  ail  cases.  When  additional  oscillators 
are  successively  added  to  the  system,  the  change  in  the  signal 
levels  is  less  and  less  signihcanc-  Addition  of  new  oscillators  to 
a  ring  network  of  eight  oscillators  does  not  have  a  perceivable 

^Of  course,  an  e.xpiicit  communication  method  could  be  used  to  detect 
uneven  load  distribution  at  the  e.xpense  ot  added  comple.xicy  and  cost. 

'  Ma.%imum  and  minimum  values  of  (he  sensor  output  corresponds  to  gains 
l.j  and  0.7.  respectively. 
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Feedback  gains  and  error  signals  or  .ive  oscillators  in  full  (top)  and  ring  'bottom)  coupling  (A  =  0.1  for  full.  A  =  0.3  for  nns  conrisuration j. 


TA3LE  IV 

DtFFE.^£.VCE3  3Er.VEH.V  Fl'LL  .\S0  RiNG  COLTLLN’G  NIETHODS 


i 

Full  Coupling 

Ring  Coupling 

communicacioti 

broadcast 

filtered  using  ID 
numben 

imar  risaai: 

levci 

reiauvely  low 

relatively  high 

propagation 

dsiay 

no 

yes 

coupling  cocf.s  i 

1  —0.  i  or  less  1 

1  -0.3 

entrainment  i 

1  fasx 

slow 

addidonaJ 

requirements 

torn!  number  or 
robots  in  a  team 
must  be  known  for 
averaging 
purposes 

agents  must 
"choose"  another 
for  coupling 

effec:.  Similar  characceristics  are  observed  in  the  period  and 
ampiicude  of  the  entrained  oscillators.  Furthermore,  a  discrete 
mapping  function  from  sensor  output  to  the  feedback  gain  may 
improve  the  detection  of  range  violations. 

In  simulation  examples  given,  we  changed  the  feedback  gain 
of  only  one  agent,  and  plotted  e/s  computed  by  ail  agents. 
.As  show'n  in  Fig.  14.  at  least  two  agents  are  able  to  detect 
ihe  error  in  both  coupling  methods,  although  detection  by 
one  agent  is  sufficient  for  the  swarm  to  become  aware  of  the 
problem:  a  large  value  of  the  magnitude  of  error  signal  (in  one 


or  more  agents)  is  an  indication  of  uneven  load  distribution. 
Tne  threshold  value  for  error  detection  is  not  the  same  for 
both  coupling  configurations.  Because  the  error  signal  is  much 
smaller  in  magnitude  for  a  fully  coupled  network,  the  threshold 
value  must  be  chosen  accordingly. 

The  stability  of  the  coupled  oscillator  system  given  here 
is  a  theoretically  unproved  issue.  We  are  aware  of  no  such 
proof  for  our  general  oscillator  circuitry.  With  the  addition  of 
unmodeled  RF  circuit  dynamics,  such  a  proof  is  unlikely. 

VI.  COMCLL'SIO.N' 

Distributed  multi-agent  systems  such  as  the  army  ants 
offer  new  and  promising  possibilities  in  the  application  of 
robotics  to  industry.  Despite  the  many  advantages  inherent 
CO  distributed  robotics  systems,  they  have  traditionally  been 
very  difficult  to  coordinate  and  control  due  to  the  absence 
of  a  central  supervisor  or  a  hierarchy  of  command.  .A.gencs 
in  a  distributed  system  muse  be  capable  of  collectively  ac¬ 
complishing  tasks  using  only  locally  sensed  informadon  and 
Uctle  or  no  direct  communicadon.  Toward  this  goal,  this  paper 
has  introduced  a  broadcast-based  coordination  scheme  that 
provides  a  global  group  dynamic  chat  can  control  individual 
agents,  is  infiuenced  by  all  agents,  but  does  not  reside  in  any 
agent. 

Our  scheme  addresses  many  of  the  coordination  problems 
that  exist  in  the  .Army  Ant  scenario  wichout  compromising 
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y  of  che  project's  pnnciples.  In  adhering  to  the  principies 
homoeeneicy  and  distributed  control  '.ve  maintain  ail  of  :ne 
•nefits  associated  with  the  Army  Ant  concept. 

In  che  fully-coupled  broadcasc*based  coordination  scheme. 
;ents  are  never  addressed  directly;  all  communication  is  indi¬ 
ct.  in  the  form  of  broadcasts.  Unlike  direct  communication 
stems  which  require  a  larger  bandwidth  as  the  number  of 
;ent5  increases,  our  method  is  completely  scalable.  The  group 
na.mic  automatically  adapts  to  changes  in  the  number  of 
:ents  m  a  coupled  network.  In  fact,  it  reacts  only  to  the 
moosite  of  all  the  coupled  heartbeat  signals  without  regard 
the  number  of  heartbeats.  .As  such,  the  scheme  is  ideally 
iced  for  a  dynamic  swarm,  w'herein  army  ants  adjust  their 
:e  to  dt  the  task  at  hand, 

Vy'hile  our  coordination  scheme  requires  the  assignment 
a  unique  heartbeat  frequency  to  each  agent,  we  in  no 
av  violate  agents’  homogeneity.  The  heartbeats  are  never 
ed  for  the  purpose  of  identification.  They  can,  however. 

1  used  for  discrimination  during  leader  selection.  We  have 
own  chat  our  scheme  can  be  used  to  entrain  agents  to  a 
ader’s  frequency.  Yet.  most  of  che  proposed  applications  or 
e  coordination  scheme  (e.g.,  synchronized  steering,  lifting. 
:d  speed  control)  only  require  entrainment  to  any  com- 
on  frequency.  Furthermore,  each  heartbeat  can  replace  any 
ner  heartbeat  in  a  coupled  network,  and  the  failure  of  a 
lartbeac  circuit  results  only  in  that  particular  agent  being 
.eluded  from  the  coordination  effort.  Thus,  our  coordination 
ethed  preserves  che  innate  robustness  of  the  .Ajmy  .Ant 

Just  as  che  .Army  .Ant  swarm  is  a  reactive/behavior-based 
stem,  so  too  is  the  coordination  scheme's  network  ot  cou- 
ed  heartbeats.  The  proposed  method  does  not  require  any 
'  asoning  or  planning.  Rather,  we  exploit  che  known  properties 
couoied  nonlinear  oscillators  to  create  the  global  group 
/na.mic  chat  controls  agents’  actions.  The  dynamic  naturally 
acts  to  any  changes  in  the  network;  because  the  dynamic 
I  'nsists  of  che  composite  signal  of  ail  heanbeacs  in  a  network, 
is  sensitive  to  variations  in  individual  heartbeats.  Tne 
acti%e  nature  of  our  scheme  allows  the  army  ants  to  respond 
lickly  in  a  dynamic  environment. 

The  primary  beneiic  of  this  research  is  that  the  broadcast- 
.sad  coordination  scheme  was  validated  by  the  actual  con- 
metion  and  testing  of  a  system  of  heanbeat  circuits.  By 
i  aiizing  che  concept  with  actual  hardware,  we  are  forced  to 
'Old  making  assumptions  that  may  later  prove  impractical. 
:)me  of  the  results  are  summarized  as  follows: 

•  The  behavior  of  the  realized  heartbeats  compared  favor¬ 
ably  to  the  predicated  behavior  that  was  obtained  through 
simulation.  .An  actual  network  of  three  heanbeacs  was 
shown  to  entrain  either  to  a  leader  frequency  or  to  a  com¬ 
mon  frequency  over  a  wide  range  of  coupling  strengths, 
in  the  presence  of  completely  unmodeled  transceivers. 

•  [t  is  Dossible  to  broadcast  and  receive  heartbeats  using 
an  inexpensive  FAI  communication  system.  The  F^'I  link 
i.ncroduces  some  distortion  into  che  network  of  coupled 
oscillators,  altering  its  behavior  somewhat.  The  devia- 
:;ons  produced  by  coupling  the  system  with  FAt  do  not 
adversely  affect  performance  goals  of  the  system. 
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•  The  broadcast-based  coordination  scheme  can  be  used 
CO  synchronize  many  actions  such  as  lifting  or  steering, 
[c  can  also  be  applied  :o  more  complex  behavior,  as 
demonstrated  in  the  global  speed  concroiler  exa.mpie.  We 
have  de.monstraced  a  simple  controller  chat  detects  when 
an  agent  is  pan  of  a  giebai  group  dynamic  then  allows  the 
dynamic  to  control  an  agent's  speed.  In  this  manner,  ail 
agents  in  a  netw'ork  travel  at  the  same  speed  and  increase 
or  decrease  their  speed  in  accord. 

•  .A  proposed  technique  for  error  detection  was  devised 
which  would  allow'  one  robot  to  adapt  its  contnbution 
to  better  match  chat  of  its  teammates.  This  automatic 
error  detection  and  correction  acts  as  a  regulating  group 
dynamic. 

Pnere  are  a  few  areas  relating  to  the  coordination  of  che 
.Army  .Ant  swarm  chat  require  further  research: 

•  Further  investigation  on  che  conditions  under  which 
agents  will  be  permitted  to  var>'  their  heartbeat  pa¬ 
rameters.  .Algorithms  for  executing  this  logic  must  be 
developed  and  tested. 

•  The  process  of  electing  a  leader  deserves  further  consider¬ 
ation.  .Although  we  want  to  avoid  a  fixed  hierarchy,  some 
form  of  hierarchy  in  form  of  ''temporary  leaders”  results. 
Additionally,  we  must  investigate  how  to  implement  the 
idea  of  selecting  as  leader  che  agent  whth  the  highest 
heartbeat  frequency  with  a  minimum  of  communication 
and  complexity. 

•  The  use  of  .Al  methods  for  adaptive  coupling  parameters 
may  prove  useful.  To  create  more  robust  and  "intelligent’’ 
system,  genetic  algonchms  and  classifier  systems  may 
be  used  to  realize  a  method  for  on-line  computation  of 
necessary  coupling  coefficients  for  a  panicular  task  and/or 
situation. 
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ABSTRACT 

The  Unmanned  Ground  Vehicle  Competition  is  jointly 
sponsored  by  the  SAE,  the  Association  for  Unmanned  Vehicle 
Systems  (AUVS),  and  Oakland  University.  College  teams, 
composed  of  both  undergraduate  and  graduate  students,  build 
autonomous  vehicles  that  compete  by  navigating  a  139  meter 
ouidooc  obstacle  course.  The  course,  which  includes  a  sand  pit  and 
a  ramp,  is  defined  by  painted  conunuous  or  dashed  boundary  lines 
00.  grass  and  pavement.  The  obstacles  are  arbitrarily  placed,  multi- 
coiorea  plastic- wrapped  hay  bales.  The  vehicles  must  be  between 
0.9  and  2,7  meters  long  and  less  than  1 .5  meters  wide.  They  must 
be  either  electric -motor  or  combustion-engine  driven  and  must 
carry  a  9  kilogram  payload  AJl  computational  power,  sensing  and 
control  equipment  must  be  earned  on  board  the  vehicle.  The 
lechnoiogies  employed  are  applicable  in  Intelligent  Transportauon 
Systems  (ITS). 

A  written  design  report  and  an  oral  presentation  are  required 
irom  -each  team,  and  expert  judges  evaluate  these  along  with 
inspecimg  the  actual  vehicles.  Design  judging  focuses  primarily  on 
uhfi  design  process  rather  than  the  implementation  of  that  design  in 
the  acniai  vehicle.  The  U.njsr  feature  is  evaluated  by  performance 
on  the  obstacle  course.  Tnc  team  winning  the  design  contest 
receives  a  SI 000  award  from  SAE  and  is  offered  the  opportunity 
to  present  their  design  paper  at  the  SAE  World  Congress.  The 
1 996  compeation  was  held  at  Walt  Disney  World  in  Orlando  on 
July  13-15. 

This  paper  presents  the  conceptual  design  of  the  vehicle  and 
iis  components.  Innovative  aspects  of  the  design  arc  highlighted, 
along  wih  descriptions  of  the  electronics,  software,  computers, 
actuators,  sensors,  and  the  means  of  system  integration.  The  steps 
followed  ’.in the  design  process  arc  described  along  with  the  use  of 
computcr'-aided  design.  Considerations  of  safety,  reliability,  and 
durability  vc  included.  The  analyses  leading  to  the  predicted 
pcnormanccof  the  vehicle  (speed,  ramp  climbing,  reaction  times, 
etc.)  are  ajs«  df^cumented.  Although  not  a  factor  In  judging,  the 
paper  also  a. cost  estimate  (not  counting  smdent  labor)  for 

die  ilnai  proctact  if  h  were  be  duplicated. 

i>rrRODUcnoN 

CAL^/TN  (Computerized  Autonomous  Land  Vehicle  with 
.atciligent  Navigation),  shown  in  Figure  1,  is  a  battery  powered 


three -wheeled  vehicle  with  a  computer  vision  system  for  line 
following  and  ultrasonic  sensors  for  obstacle  avoidance.  CALVIN 
was  one  of  two  vehicles  entered  by  the  Virginia  Tech  Team  in  the 
4th  Annual  International  Autonomous  Ground  Vehicle 
Competition. 


Figure  1  :  Overview  photo  of  CALVIN 


This  is  the  first  year  that  Virginia  Tech  has  participated  in  the 
competition.  As  pan  of  our  preparation,  team  mcmbeis  reviewed 
technical  papers  written  by  recent  competitors  and  video  tapes 
from  the  past  two  competitions.  It  became  clear  that  the  contest 
presents  a  formidable  task  and  that  a  robust,  wcU-dcvcloped  and  ! 

carefully  tested  design  would  be  a  basic  requirement  for  success.  \ 

With  this  in  mind,  the  team  made  a  coocened  effon  to  pursue 
simple,  rsiiable,  cost-effeedve  designs  for  the  base  componcncs  and  i 

subsystons.  This  is  also  the  approach  recommended  by  Gifford,  et  1 

ai.,  [1995]  as  pan  of  their  winning  philosophy  in  the  1995  ^ 

competition.  Keeping  this  approach  in  mind,  CALVIN’s  design 
incorporates  many  of  the  features  that  have  been  used  successfully 
m  recent  competitions.  For  example,  like  several  past  entrants, 

CALVIN  is  based  on  the  chassis  of  a  three-wheel  golf  can. 


Allhou^.  ihis  plaaorm  is  larger  Lhan  Lhe  ideal  vehicle,  u  does 
provide  a  nis^cd,  stable  base  with  ample  space  for  equipment  and 
payload  .Also  following  past  successes,  CALVIN  uses  a  computer 
Msion  sv'stem  for  Une  follow'ing  and  ultrasomc  sensors  for  obstacle 
avoidance.  The  innovaave  elements  of  the  design  can  be  found  in 
those  areas  that  caused  failure  or  were  recognized  as  problems  in 
otherwise  successful  vehicles  from  past  compeuiions.  These 
mnovativc  features  are  discussed  in  detail  later  in  this  paper. 

PROJECT  ORGAiVI2L\TION 

PROJECT  TE*AMS  -  As  one  of  the  largest  technical 
universiues  m  tie  countrv',  Virgmia  Tech  has  a  talented  and  diverse 
snident  popoiaiion  from  which  to  draw.  .AJ though  many  large 
projects  (s<aci  as  the  SAE  Mim^Baja  and  Solar  Car)  have  been  run 
in  the  pas:.,  uha  autonomous  robooic  project  generated  unparalleled 
e.xc:temrat among  students  and^ulty.  A  small  team  of  volunteers 
’oegan  to  organize  the  project  m  the  summer  of  1 995.  By 

the  fail  semester,  the  combined  team  working  together  on  the  two 
vehicles -‘bad  grown  to  ainioss  50  students  from  across  the  Colleges 
ofEng?.n«nng  and  Arts  and  Science.  To  direct  the  efforts  to  build 
CALVTif;  four  separate  artnups  were  formed  to  cover  Mechanical 
Desizm.  Sensing  &  Ccsnirol.  Computer  Vision,  and  Project 
Organjzation  with  each  roup  subdivided  to  distnbute  the  work. 
To  fadlicate  group  coriiinimcauon,  meetings  were  held  twice  a 
week  ^ith  monthly  presenauons  given  by  each  group. 

PROJECT  TIMELBE  -  Organization  and  scheduling  is  a  key 
ingredient  in  the  success  of  a  project  of  this  scope  and  duration. 
.Although  many  intermeiiate  objecuves  have  been  achieved  along 
the  w^,  major  project  criiotones  are  shown  on  the  general  project 
timeiine  at  the  end  of  tlTcpapcr. 


DESIGN  PROCESS  OVERVEEW 

GENERAL  DESIGN  DESCRIPTION  -  The  general  overall 
design  of  CALVIN  will  be  desenbed  with  reference  to  the 
photographs  shown  as  Figures  2  and  3,  Figure  2  shows  a  top  view 
of  C.ALVTN  with  the  weather-protecuve  top  cover  removed.  Tne 
general  layout  of  the  base  vehicle,  along  with  a  num’oer  of  the 
major  components,  arc  visible  in  this  picture.  These  components 
include  the  24  volt  dnvc  motor,  timmg-belt  drive  and  differential; 
the  12  volt  linear  steering  actuator  with  integral  feedback 
potentiometer,  the  microcontroller,  the  on-board  emergency  stop 
(c-stops);  the  three  1 2- volt  bancries  and  the  antenna  for  emergenev 
scop  and  radio  conuol  when  not  in  compeuuon.  The 
microcontroller  and  electronics  mount  to  a  hinged  shelf  that  opens 
to  give  access  to  the  pentium-based  PC  and  power  supplies. 
Figure  3  is  a  front  view  of  CALVIN  that  shows  the  five 
bumper-mounted  ultrasomc  sensors  and  the  two  sidc-viewmg 
cameras,  along  with  some  of  the  weatherproof  Lexan  used  to 
encase  the  vehicle. 

DESIGN  TOOLS  -  .As  with  all  aspects  of  our  design,  the  tools 
used  in  the  creative  process  were  dictated  by  the  overall  objecuve 
of  producing  a  safe,  competitive  vehicle.  All  Virginia  Tech 
engmeenng  students  are  trained  in  the  use  of  either  .AutoCad  or 
Cadkey,  and  both  were  used  extensively  in  the  design  process.  The 
uses  of  computer-aided  design  ranged  from  preparing  a  test  course 
layout  with  a  shape  similar  to  the  sample  course  shown  in  the 
contest  rules,  to  detailed  layout  of  the  vehicle  systems  and 
subsystems.  As  an  example,  AutoCad  was  used  to  produce  a 
layout  (Figure  4)  of  the  existing  base  frame  of  the  golf  can. 
Attachments  and  modifications  were  then  based  on  these  drawings. 
Other  software  used  during  the  course  of  the  project  included 


Figure  2:  Top  view  photograph 
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Figure  3:  Front  view  photograph 

spreadsheets,  TKSolvcr!,  MathCad  and  Mathematica  for  general 
maihemaacal  modeling  and  MatLab  for  control  and  vibration 
analysis.  Specialized  software  was  also  wntten  to  aid  with  steering 
kincmaucs  and  vision  system  modeling. 

DESIGN  CRITERIA  (SAFETY,  RELIABILITY  AND 
DURABILITY)  -  The  most  critical  design  criteria  were  those  that 
involved  safety  and  those  necessary  to  meet  contest  rules.  Beyond 
these  fundamental  requirements,  a  contmuing  effort  has  been  made 
to  incorporate  reliability  and  durability  into  the  fmal  design.  This 
goal  has  been  pursued  both  through  the  careful  specification  of 
proven,  reliable  components  and  through  extensive  field  testing 
during  every  phase  of  the  design.  In  subsequent  sections,  bnef 
discussions  will  help  to  highlight  the  effon  to  produce  an  mhercntly 
safe,  reliable  vehicle.  Safety  was  also  a  top  priority  in  ail  shop 
work  and  during  field  testing. 


F.4IL-SAfE  BRAKES  -  Acceptable  braiong  performance  was 
assured  by  using  the  pnmaiy  components  from  the  ongmal 
goif-cart  braking  system.  The  only  difference  is  that  the  acniaung 
force  normally  applied  by  the  user  through  the  foot  pedal  has  been 
replaced  by  a  fail-safe  input  force  from  a  pncumauc  acuiaror.  .An 
eiectro-mecharucai  valve  and  relay  system  control  the  operation  of 
the  pneumatic  acoiator.  While  the  vehicle  is  m  operation,  the  mam 
power  bus  closes  the  relay,  powenng  the  valve,  thus  keepmg  the 
brakes  disengaged.  When  the  circuit  loses  power,  valve  power  is 
lost  and  the  brakes  are  engaged.  This  condition  occurs  when  either 
the  onboard  E-stop  or  the  remote  E-siop  is  acuvated. 

STEERING  ACTUATOR  AND  CONTROLLER  -  The  rack- 
and-ptnion  steering  system  on  the  ongmal  golf  cart  was  replaced 
with  a  12-volt  DC  linear  actuator  with  integrated  position 
feedback.  The  motor  in  this  system  is  controlled  by  a  Pulse  Width 
Modulation  (PWM)  controller.  The  input  signal  to  this  controller 
can  be  generated  by  cither  the  microcontroller,  in  autonomous 
mode,  or  a  radio  control  receiver  in  manual  override  mode. 
Manual  ovemde  mode  is  used  for  transportation  of  the  vehicle. 

DRIVE  MOTOR  AND  CONTROLLHl  -  CALVTNs  primary 
drive  system  uses  a  0.746  kW,  24V  DC  motor  with  tachometer 
feedback  for  precise  velocity  control.  Power  is  delivered  to  the 
rear  wheels  through  a  1.8  to  I  timing  belt  reducer  and  the  existing 
6  to  1  golf  cart  differential.  The  speed  of  this  motor  is  controlled 
using  a  PV/M  H-Bridge  controller.  The  concrollcr  is  capable  of 
adjusting  its  PWM  duty  cycle  so  as  to  produce  an  cffecuvc 
continuous  output  voluge  from  -24  to  +24  volts  and  a  current  of 
160  amps  continuous  and  320  amps  peak.  Like  the  steering 
syston,  the  input  signal  to  the  controller  can  be  generated  by  cither 
the  microcontroller  with  active  velocity  control,  in  autonomous 
mode,  or  a  radio  control  receiver  in  manual  override  mode. 

ULTRASONIC  AND  TACTILE  OBSTACLE  SENSORS  - 
CALVIN  is  equipped  with  two  types  of  sensors  in  order  to 
facilitate  obstacle  avoidance.  First,  a  fan-shaped  array  of  five 
ultrasonic  sensors  is  used  to  locate  obstacles  potenuaily  in  the 
vehicle’s  path.  In  the  event  that  an  ultrasonic  sensor  detects  an 
obstacle,  an  algorthim  is  run  in  order  to  detennine  how  the 
vehicle’s  path  should  be  changed.  This  routine  considers  the 
vehicle’s  distance  to  the  obstacle,  and  which  ultrasonic  sensors  ui 
the  array  detected  the  obstacle. 

The  tactile  sensors  consist  of  three  push-bars  on  the  front  of 
the  vehicle.  These  sensors  arc  used  to  indicate  that  the  vehicle  has 
collided  with'  an  obstacle.  In  the  event  that  a  tactile  sensor  is 
actuated,  an  inierrupi  routine  immediately  stops  the  vehicle  and 
performs  a  reverse  maneuver.  The  reverse  maneuver  steers  the 
vehicle  based  upon  which  tactile  sensor  was  actuated,  in  order  to 
relocate  the  vehicle  further  from  the  obstacle.  The  vehicle  then 
returns  to  nonnal  operation  and  continues  on  its  path  around  the 
track.  The  ultrasonic  and  tactile  sensor  arrays  can  be  seen  in 
Figure  5. 

COMPUTER  AND  SOFTWARE  •  The  vision  system  is 
implemented  using  a  pentium  lOOMHz  PC  and  an  FFl  DSP  Frame 
Grabber  from  Current  Technology,  Inc.  All  vision  algorithms  were 
developed  usmg  standard  C  and  the  frame  grabber’s  C  libranes. 
Stai^dard  C  is  also  used  to  code  and  compile  the  software  necessary 
3  to  communicate  between  the  computer  and  the  microconcroiler. 


Figure  5:  Ultrasonic  and  Tactile  Sensor  Arrays 

The  vision  system  is  based  on  two  cameras  looking  to  the  left 
and  right  of  the  vehicle,  but  only  one  camera  is  active  at  a  time. 
>^'ben  the  active  camera  obtains  a  senes  of  three  consecutive  poor 
images,  the  system  switches  to  the  view  on  the  other  side.  Image 
processing  is  performed  on  a  view  of  the  course  directly  adjacent 
to  the  vehicle.  Of  this  whole  image,  only  two  smaller  regions  of 
interest  (ROI)  are  considered  to  be  significant  All  image 
processing  is  then  performed  on  these  ROIs  only,  in  order  to  save 
computation  time.  The  frame  grabber  first  thresholds  the 
gray-scaic  camera  image  to  convert  the  image  to  binary  black  and 
white.  The  thresholding  routine  is  dynamically  adjusted  for  each 
ROI  to  recognize  the  white  paint  on  grass  as  the  brightest  portion 
of  die  image.  This  means  that  the  ROI  images  used  to  navigate 
consist  of  white  line  segments  on  an  all-black  background.  The 
vision  code  then  uses  these  two  ROI  to  detect  a  forward  and  a  rear 
segment  ol  the  course  line.  In  order  to  discern  the  line,  the  routine 
nnds  the  grouped  blobs  which  it  extracts  as  a  segment  of  the  line, 
me  locauon  of  the  centroids  of  these  blobs  arc  then  compared  to 
those  found  when  initializing  the  vehicle  in  the  center  of  the  course. 
This  inlormauoQ  is  used  to  determine  the  vehicle’s  distance  and 
angle  relative  to  the  line  deiining  the  course.  The  superposition  of 
this  zeroth  order  and  first  order  path  informauon  results  in  a 
desired  steering  angle.  This  angle  is  then  processed  to  be 
transmined  to  the  microcontroller. 

The  microcontroller  integrates  the  desired  steering  angle 
(based  solely  on  vision)  with  information  from  the  ultrasonic  and 
cacuie  sensors.  Priority  must  be  assigned  to  obstacles  potentially 
in  die  vehicle's  path;  therefore  tactile  and  ultrasonic  sensor 
information  is,  in  some  cases,  weighted  more  heavily  than  the 
vision  in  determining  the  actual  steering  angle.  Consequently,  the 
microconmolia*  combmes  the  three  inputs  to  determine  the  path  of 
die  vehicle,  in  cftcct,  the  microcontroller  peribrms  all  onboard 
navigaaon. 

^SENSORS,  ELECTRONIC  LAYOUT  AND  SYSTEM 
INTnGRATION  -  Figure  6  is  a  schematic  diagram  showing  the 
general  archirecrurc  and  integration  of  the  sensors,  electronics  and 
comeuters  used  on-board  CALVIN.  Once  an  image  has  been 
processed,  the  PC  sends  a  steering  angle  to  the  microcontroller  via 
a  senai  connecuon.  If  the  course  boundary  line  is  not  clearly 
detfirird  by  the  vision  system  in  several  successive  frames,  the  PC  4 


also  sends  a  command  to  the  mjcrocontroller  to  switch  to  the 
second  camera,  and  the  vision  system  begins  traclcmg  the  opposite 
line.  Ultrasonic  signal  processing  as  well  as  integration  and  direct 
control  funcuons  are  handled  by  the  Motorola  68HC 11 -based 
mjcrocontroller.  This  microcontroller  also  executes  closed-loop 
control  of  the  steering  and  drive  motors  using  pulsc-width- 

modulaicd  signals.  A  schematic  layout  of  the  battcrv  connecuons 
and  power  control  elements  is  shown  in  Figure  7. 

DESIGN  PROBLEMS  AND  SOLUTIONS  -  Throuahout  the 
design  process  for  CALVIN,  precise  design  tools  and  techniques 
were  used  whenever  possible.  Occasionally,  however,  an  event 
would  arise  where  trial-and-error  methods  were  necessary.  The 
first  arose  from  the  imprecise  geometry  of  the  base  frame.  Calvin 
began  as  a  golf  cart,  but  came  with  no  factory  dimensions. 
Measurements  were  made  and  CAD  drawings  constructed,  but  the 
angles  ofthe  fiiame  were  difficult  to  accurately  measure,  and  <!mall 
differences  had  a  great  effect  on  the  positioning  of  integral  parts. 
This  forced  a  triaJ-and-error  technique  on  mounting  the  drive 
motor,  steering  actuator  and  tachometer. 

Electrical  problems  are  also  unavoidable  on  a  vehicle  with  so 
many  components.  Floating  grounds  and  ground  loops  are  primary 
contributors  to  these  difficulties.  During  the  layout  anH  wiring  of 
the  vehicle,  an  effort  was  made  to  keep  the  wires  as  short  and 
organized  as  possible  to  facilitate  quick  troubleshooting.  Still  care 
had  to  be  taken  to  reduce  the  interference  in  many  of  the  electrical 
components,  most  noubly  the  cameras  and  the  monitors.  To  do 
this,  ground  wires  were  consolidated,  and  shielded  coaxial  wire 
was  used  when  possible. 

PERFORMANCE  PREDICTIONS 

BRAJCING  and  speed  control  -  The  most  critical 
performance  issues  were  those  that  involved  safety  requirements 
and  meeting  contest  guidelines.  Braking  and  speed  connol, 
especially  on  an  inclined  surface,  are  obviously  of  great 
imponance.  An  earlier  section  of  this  paper  describes  the  design 
of  the  braking  system  and  how  the  foot  pedal  input  was  replaced  by 
3  pncum&tic  dctufltor.  Acc^pt^blc  pcnorniflncc  w&s  assured  by 
seiccung  an  acruaior  that  supplied  the  same  brake  cable  tension  as 
a  typical  user  would  supply  through  the  brake  pedal  in  an 
emergency  condition  (rou^y  500N). 

Accuraie  speed  cononoi  is  important  for  safe  vehicle  operation 
and  for  assuring  constant  performance  of  the  vehicle  on  sloped 
surfaces.  Several  past  contestants  reported  control  problems  that 
were  directly  attributed  to  speed  variations.  Velocity  is  controlled 
oo  CALVIN  using  a  tachometer  attached  directly  to  the  output  shaft 
of  the  dnve  motor  to  provide  a  voltage  signal  proportional  to 
vehicle  speed  This  sign^  is  fed  back  to  the  raicrocontrollcr,  which 
issues  a  pulse- width-modulated  control  signal  to  the  motor  driver. 

By  modeling  the  vehicle  as  a  mass  moving  on  a  friciionless  surface, 
control  system  gains  can  be  selected  to  give  stable  velocity  control 
with  an  adequate  response  lime, 

RAMP-CLIMBING  CAPABILITIES  AND  SAND  PIT 
PERFORMANCE  -  Competition  rules  permit  ramps  or  inclined 
terrain  to  have  a  grade  of  up  to  fifteen  percent.  This  information, 
coupled  with  the  8,04  Icm/hr  maximum  allowed  speed  and  the 
vehicle  s  estimated  weight,  can  be  used  to  directly  compute  the 
minimum  required  power  of  the  drive  motor  (suicc  p>owcr  equals 
force  muiupiied  by  velocity  in  the  direction  of  the  force)  A  twenty 


Figure  6:  Sensor  and  control  mtegrauon 


percent  grade  and  a  8.04  km/hr  vehicle  speed  produce  a  vertical 
veiocicy  component  of  1.61  km/hr.  Assuming  a  1780  N  vehicle 
weight  and  no  friction  results  m  a  required  drive  motor  of  about 
0,746  kW.  Recognizing  that  steep  inclines  would  be  traversed  at 
1.61  km/hr  or  less,  this  was  taken  to  be  a  conservative  estimate,  A 
0.746  kW  continuous,  1.49  kW  peak,  24  volt  DC  drive  motor  was 
selected  and  purchased  from  a  surplus  catalog.  Usmg  a  speed 
rcducuon  of  about  10: 1  through  a  timing  belt  drive  and  the  golf  can 
didfcrcnGal,  the  desu'cd  8.04  km/hr  p>eak  speed  and  ramp«c limbing 
ability  w'cre  achieved  as  predicted.  Designing  a  vehicle  to  traverse 
a  sand  pit  was  a  less  precise  prediction,  Goodyear,  who  donated 
±c  vehicle  tires,  provided  assurances  that  these  wide,  soft  tires 
would  perform  well  in  sand  and  would  not  damage  the  grass 
poruon  of  the  course. 

REACTION  TIMES  AND  LINE-TRACKING 
PERFORMANCE  -  With  any  digital  control  system,  the  faster  the 
jpdaie  axed  of  the  controller,  the  more  stable  the  performance  of 
-he  controlled  device  under  rapidly  changing  conditions.  The 


computer  vision  and  ultrasonic  acquisition  and  computation 
requirements  obviously  limit  the  overall  update  rate  of  the 
controller.  The  team  attempted  to  partially  address  this  issue 
during  conceptual  design  by  separate,  parallel  processing  of  the 
uitrasomc  and  vision  feedback.  The  vision  feedback  processinz  is 
accomplished  directly  on  the  frame  grabber  card  or  m  the  host 
Penuum  100  MHz  personal  computer.  Operation  of  the  ultrasonic 
sensors  and  the  associated  processing  are  accomplished  in  the 
Mocorola  68HC 1 1  -based  microcontroller.  This  allows  a  maximum 
update  rate  to  the  steenng  and  drive  motors  of  about  6  Hz.  At  peak 
vehicle  speed,  this  amounts  to  an  update  about  once  every  0.3  m  of 
travel.  Through  testing,  this  has  been  detcmiincd  to  be  margmal 
for  robust  line  tracking;  one  or  two  bad  images  can  result  in  the 
vehicle  traveling  out  of  bounds.  Nevertheless,  within  the  constants 
of  the  existing  equipment,  this  is  a  reasonable  update  rate.  At 
compention,  we  intended  to  run  CALVIN  at  peak  speeds  of  about 
3.22  km/hr  to  maintain  stable  opjcration. 


Figure  7:  Power  Supply  diagram 


ovative  aspects  of  the  design 


The  design  philosophy  used  in  developing  CALVIN  did  not 
pec^fically  include  innovation  as  a  design  objective.  Our  pnmarv' 
/ojecavc  was  to  design  the  most  competitive  possible  vehicle 
-nder  the  constraints  of  the  governing  contest  rules  and  limited 
inanaal  resources.  Nevertheless,  the  resulting  design  is  believed 
0  be  innovative  in  several  significant  respects.  In  ail  cases, 
.owe^/cr,  the  innovative  features  are  intended  to  address  problems 
r  snortcomings  reported  in  similar  vehicles  from  previous 
ompeuuons. 

.A  simple  example  of  this  was  the  design  of  a  system  that  uses 
nly  three  12-volt  deep-cycle  batteries  as  the  source  for  all  on- 
oard  power.  Power  for  the  PC  and  penpheral  equipment  was 
applied  using  an  400  Watt  invener  to  generate  120  volt  AC 
ower.  Other  DC  power  requirements  were  met  through  small 
n-board  power  supplies  or  through  senes  connccuon  of  the 
attenes.  This  power  supply  arrangement  was  previously  shown  as 
igure  7.  While  this  may  seem  to  be  a  mmor  issue,  a  number  of 
revious  teams,  including  the  first-place  1995  Colorado  team 
Cifrbrd.  1995],  cited  the  maintenance  of  several  dissimilar 
aitcncs  as  a  significant  disadvantage.  The  University  of  West 
irginia  team  also  cited  the  weight  penalty  and  potential  danger 
isociated  with  their  eight-  bancry,  96- volt  DC  bus  system  [Santa, 
995]. 

A  second,  and  more  significant,  innovative  feature  of  CALVIN 
;  the  use  of  two  independent  cameras  for  line  tracking.  This 
movation  is  an  attempt  to  address,  in  a  cost-effective  way,  the 
roblems  noted  by  1994  and  1995  compeutors  with  single-camera 
✓stems  [Murphy,  1995,  Gififord,  et  al..  1995].  Originally,  the 
?am  hoped  to  use  two  cameras  and  two  independent  frame 
rabbers.  With  such  a  system,  each  camera  could  continuously 
ack  one  of  the  rwo  course  oundary  lines.  Unfortunately,  two 
ame  grabber  cards  or  a  single  muiti-inpuc  frame  grabber  proved 
)  be  too  expensive,  and  an  alternative  innovative  solution  was 
iTsued  The  end  result  was  to  use  a  simple  relay-actuated  video 
VI ten  to  change  camera  views  when  the  course  boundary  lines 
sappeared  or  became  difficult  to  detect  Using  a  line-integrity 
leck  based  on  the  size  and  grouping  of  white  image  areas,  the 
Tnpuier  vision  system  makes  three  attempts  to  find  a  line  with  the 
rmi  camera.  If  two  successive  attempts  fail,  the  vision  software 
nds  a  command  to  the  microcontroller  to  switch  to  the  camera  on 
c  other  side  of  the  vehicle  and  begin  tracking  the  opposite  line, 
osed  on  a  video  update  rate  of  at  least  six  firames  per  second  (bad 
lages  can  often  be  rejected  quickly,  resulting  in  a  higher  rate), 
id  a  maximum  speed  of  8.04kin/hr,  the  vehicle  will  travel  no 
rthcr  dian  0.244  m  in  this  time. 

A  third  innovative  feature  of  CALVTN’s  design  is  the  use  of 
radio  controller  to  generate  the  same  PWM  signals  as  the 
icrocontroilcr  for  the  steering  and  drive  motor  controllers.  This 
lows  sunple  switching  berween  autonomous  and  manual  modes, 
nich  greatly  facilitates  vehicle  setup  and  testing.  During  normal 
3craaoa,  ail  user  intcracuon  with  CALVIN  is  through  the  radio 
mirollcr  or  through  a  single  weather-resistant  control  panel 
cunted  on  the  left  rear  quarccrpanel  of  the  vehicle.  This  control 
uicl  IS  shown  in  Figure  8  below.  While  this  is  again  a  simple 
ature,  it  is  a  clear  improvement  over  the  interfaces  used  by 
■hicics  in  previous  compeutions,  many  of  which  required 
✓icw'ard  interactions  including  manual  pushing  for  placement  and 
jup. 


Figure  8:  The  control  panel 

A  fourth  innovative  feature  of  CALVIN  is  the  separation  of 
data  processing  between  the  PC  and  the  microcontroller.  Since 
data  processing  speed  has  historically  limited  other  vehicles 
maximum  speeds  [Chcok,  1 994,  Nagy  ct  al.,  1 995],  a  system  was 
designed  to  process  data  in  parallel.  By  processing  closed-loop 
velocity  and  steering  control,  as  well  as  ultrasonic  and  tactile 
sensor  data  on  CALVTN’s  microcontroller,  the  PC  is  used  solely 
for  the  computationally  intensive  vision  algorithms.  This 
separation  of  data  processing  greatly  reduces  the  total  processmg 
time,  therefore  increasing  the  maximum  sustainable  vehicle  speed- 

VEHICLE  COST  AND  FUNDING  SOURCES  -  The  tabic 
at  the  end  of  the  paper  lists  the  ready-made  components  that  were 
purchased  or  donated.  Separate  columns  list  both  the  retail  cost 
and  the  purchase  price  of  the  component  Note  that  many  items 
were  either  donated  or  sold  to  the  project  team  at  a  substantially 
reduced  cost  Also  note  that  all  fabricated  components,  such  as 
mounting  brackets,  sensor  housings,  framework,  bumper  and  the 
Lexan  outer  shell,  were  built  by  members  of  the  design  team  in  the 
student  shop  of  the  Mechanical  Engineering  Department  In 
addition,  team  members  did  a  great  deal  of  electronic  design  and 
circuit  fabrication.  It  is  estimated  that  approximately  5000  person 
hours  have  been  mvested  in  this  project  This  includes  time  spent 
on  acuviiies  such  as  project  organization,  firndraising.  publicity, 
and  research. 

COMPEXmON  RESULTS 

The  1996  Unmanned  Ground  Vehicle  competition  held  at 
Walt  Disney  World,  Epcot  Center  July  13-15,  was  composed  of 
two  sub-cxxnpctitons.  The  first  was  the  SAE  Design  Competition, 
which  evaluated  the  written  report,  an  oral  presentation  and  a  static 
viewing  for  each  vehicle. 

The  second  competition  was  the  obstacle  course  race.  All 
vehicles  were  given  three  attempts  at  completing  the  course,  and 
the  vehicle  to  complete  the  course  in  the  shortest  amount  of  time, 
or  to  proceed  the  furthest  in  any  one  run  would  be  declared  the 
winner  of  this  competition. 

Virginia  Tech’s  CALVIN  vehicle  was  declared  the  winner  of 
the  SAE  Design  Competitioa  Virginia  Tech’ s  other  vehicle, 
BOB,  scored  third  place  in  the  SAE  Design  Compctiion.  The 
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winner  of  the  obstacle  course  race  was  Ohio  State  Universttv's 
vehicle.  Virgina  Tech' s  obstacle  course  race  results  were  13th 
and  6th  for  CALVIN  and  BOB  respectively. 
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- Part  .Name  and  Descrintinn _ _ 

Base  Vehicle  -  Gas  Powered  E-Z  Go  golf  can  fused) 

Drive  Train  -  12  VDC  0.764  kw  Thermo  King  Motor  fsuralus) 

100 

Drive  Train  Comoonents  Ibelts.  oullevs.  etc.) 

75 

75 

Comouter  -  Pentium  100  Mhz  PC 

1 125 

1  125 

DSP  Frame  Grabber  (Current  Technologies) 

donated 

900 

Motorola  68HCI 1 -based  Microcontroller  f Coactive  Asthetics) 

125 

250 

Assorted  Elecmonic  Comoonents  and  Cables 

150 

150 

Polariod  Ultrasonic  Sensors  (six) 

350 

350 

Linear  Actuator  -  12  VDC  6”  stroke  (Motion  Svstems) 

donated 

300 

2  Cameras  &  Lenses  IDEI  /  Professional  Security  Alliance) 

400 

1200 

PWM  Motor  Controllers  (VANTECI 

450 

950 

Aiuminum  for  Frame  and  other  structures  (RJ.  Reynolds) 

donated 

Batteries  (Sean) 

150 

300 

Air  Tank  (Steel- Fab) 

donated 

100 

400  W  Inverter  (Trioo-Lite) 

donated 

Lexan  for  Outer  Shell  (Piedmont  Plastics) 

donated 

350 

TOTAL 

!  3.425 

6,950 

Vehicle  Cost  Breakdown 


1  May  1995 

Aug.  1995 

OcL  1995 

Dec.  1995 

Jan.  1996 

Mar.  1996 

May  1996 

July  1996 

Autonomous 

•  Golf  cart 

Base  vehicle 

Conceptual 

Spnng 

Computer 

Formation  of 

Inter-team 

Robotic 

acquired  for 

operational 

design/frame 

semester 

vision  and 

summer 

competition 

Vehicle 

base  vehicle 

and  prepared 

modifications 

teams 

ultrasonic 

teams 

between 

Project 
Introduction 
at  Virginia 
Tech 

Academic 
year  begins, 
large  project 

for 

modification 

completed 

Fail-safe 
brake  system 

organized 

System 

components 

sensors 

tested 

Automatic ' 

Systems 

integration 

completed 

CALVIN  and 

aoB 

Design  paper 

Preliminary 
design  team 
assembled 

team 

organized 

Implemented 

ordered 

Drive  and 
steering 
systems 
tested 

drive  control 
specified 

Testing  and 
subsequent 
modification 
begins 

submitted 

Departure  for 
Epcot  7/12 

4th  Annual 
UGR 

competition 
in  Eocot 

Project  Timeline 


7 


RiiFERZNCES 

Murphy,  Robin  R.,  An  Artificial  Intelliger.ce  Approach  to  the 
1994  AUVS  Unmanned  Ground  Robotics  Competition.  EEE 
Intemaiional  Coni'ercnce  on  Systems,  Man,  and  Cybernetics,  Oct., 
1995,  Vancouver,  B.C.,  Paper  0-7803-2559-1/95. 

Gifford.  K.  K.,  Deeds,  M.,  Van  der  Hoek.  A..  Henning,  F., 
Freeman,  J.,  Haussman,  G..  Kuzminsky,  S.,  Stollcr,  S.,  RoboCar: 
Sortware,  Hardware,  and  Mechanical  Design  Issues  for  the 
Uraversicy  of  Colorado  Autonomous  Rover  Vehicle,  SPE 
Voi.2591,  Sept.,  1995. 

Santa.  L.  E.,  "Undergraduate  Robot  Design  at  West  Virginia 
University,"  Mobile  Robots  X,  SPE,  Vol.  2591,  1995,  pp. 
202-208. 

Check,  K.  C.,  "Autonomous  Unmanned  Ground  Robotic  Vehicle 
Competition:  An  Intelligent  Control  Challenge,"  Proceedings  of 
the  .Amencan  Control  Conference,  Vol.  I,  1994,  pp.  383-387, 


Gifford,  K.  !C,  Deeds,  M,  van  der  Hcdc,  A.,  Henning,  F.,  Freeman, 
J..  Haussman,  G.,  Kruzminsky,  S.  and  Stoller,  S.,  "RoboCar: 
Software,  Hardware,  and  Mechanical  Design  Issues  for  the 
Lniversity  of  Colorado  .A.utonomous  Rover  Vehicle,"  Mobile 
Robots  X,  SPE,  Vol.  2591,  1995,  pp,  228-238, 

Matthews.  B.,  Ruthemeyer,  M.,  Perdue,  D.  and  Hall.  E.  L., 
"Development  of  a  Mobile  Robot  for  the  1995  AUVS 
Competition,"  Mobile  Robots  X,  SPE  Vol.  2591  1995  pp 
194-201. 

Murphy,  R.  R.,  Hoff,  W.,  Blitch,  J.,  Gough,  V.,  Hawkins,  D., 
Hoffinan,  J.  C.,  Kroslcy,  R,  Lyons,  T.,  Mali.  A.,  MacMillan,  J.  and 
Warshawsky,  S.,  'Colorado  School  of  Mines  Behavioral  Approach 
to  the  1995  UGR  Compeution,"  Mobile  Robots  X,  SPE,  1383, 
November  1990,  pp.  436-447. 

Nagy,  P .  V.  and  Bock,  T.,  "The  Northern  Illinois  University 
Autonomous  Mobile  Robots."  Mobile  Robots  X.  SPE  Vol  259 1 
1995,  pp.  209-219. 


8 


Navigation  of  an  autonomous  ground  vehicle  using  the  subsumption  architecture 

Paul  J.  Johnson,  Kevin  L.  Chapman.  John  S.  Bay 

Machine  Intelligence  Laboratory,  Bradley  Department  of  Electrical  Engineering, 

Virginia  Polytechnic  Institute  and  State  University,  Blacksburg,  Virginia  24061-01 1 1 


ABSTRACT 

The  subsumption  iirchitecture  is  used  to  provide  an  autonomous  vehicle  with  the  means  to  stay  within  the 
hound^iries  of  a  course  while  avoiding  obstacles.  A  three-layered  network  has  been  devised  incorporating  computer 
vision,  ultrasonic  ranging,  and  tactile  sensing.  The  computer  vision  system  operates  at  the  lowest  level  of  the 
network  to  generate  a  preliminary  vehicle  heading  based  upon  detected  course  boundaries.  The  network’s  next 
level  performs  long-range  obstacle  detection  using  an  array  of  ultrasonic  sensors.  The  range  map  created  by  these 
sensors  is  used  to  augment  the  preliminary  heading.  At  the  highest  level,  tactile  sensors  are  used  for  shon-range 
obstacle  detection  and  serve  as  an  emergency  response  to  obstacle  collisions.  The  computer  vision  subsystem  is 
implemented  on  a  personal  computer,  while  both  ranging  systems  reside  on  a  microcontroller.  Sensor  fusion 
within  a  subsumption  framework  is  also  executed  on  the  microcontroller.  The  resulting  outputs  of  the  subsumption 
network  are  actuator  commands  to  control  steering  and  propulsion  motors.  The  major  contribution  of  this  paper  is 
iis  a  case  study  of  the  application  of  the  subsumpuon  architecture  to  the  design  of  an  autonomous  ground  vehicle. 

Keywords:  control  systems,  subsumption  architecture,  sensor  fusion,  autonomous  vehicles,  obstacle  course 
navigation,  computer  vision,  ultrasonic  ranging 

1.  INTRODUCTION 


This  paper  describes  the  control  system  design  issues  encountered  by  the  Virginia  Tech  Autonomous  Vehicle  Teajn 
in  its  development  of  two  autonomous  ground  vehicles.  These  vehicles  were  built  for  entry  in  the  Fourth  Annual 
International  Autonomous  Ground  Robotics  Vehicle  Competition,  which  was  sponsored  by  the  Association  of 
Uniminiied  Vehicle  Systems  International  (AUVSI)  and  was  held  at  the  Epcot  Center  in  Orlando,  Florida  on  July 
13-15,  1996.  The  goiil  of  the  competition  was  to  have  an  unmanned  ground  vehicle  navigate  an  outdoor  obstacle 
course  roughly  500  feet  long  in  no  more  than  ten  minutes,  while  obeying  a  speed  limit  of  five  miles  per  hour. 
White  iuid  yellow  Line  markers  were  painted  on  the  ground  to  define  the  course’s  boundaries,  while  hay  bales 
served  obstacles 

Successful  obstacle  course  navigation  requires  the  ability  to  detect  course  boundaries  and  obstacles,  and  then  make 
intelligent  control  decisions.  While  it  would  be  possible  to  gather  ail  needed  boundary  and  obstacle  information 
using  a  single  sensory  device,  such  as  a  camera  attached  to  a  computer  vision  system,  it  is  more  likely  that  multiple 
types  of  sensors  would  be  used.  This  would  allow  the  vehicle  to  utilize  the  abilities  of  very  effective,  but  highly 
specific,  sensory  devices.  The  problem  that  then  arises  is  that  of  combining  these  different  sensing  modalities  into 
a  limited  number  of  control  signals.  This  is  known  as  the  sensor  fusion  problem 

One  solution  to  the  sensor  fusion  problem  can  be  found  in  the  subsumption  architecture  This  architecture 
decomposes  a  control  problem  into  a  collection  of  behaviors.  Individual  behavior  modules  are  developed  to  process 
sensory  data  so  as  to  create  a  simple  behavior,  such  as  line  following.  Multiple  modules,  each  with  potentially 
different  sensory  inputs,  are  then  developed  to  produce  additional  behaviors.  Through  the  use  of  a  hierarchy, 
outputs  of  some  modules  are  allowed  to  subsume  the  outputs  of  other  modules,  thus  creating  a  single  actuator 
command  from  multiple  sensory  inputs. 

The  focus  of  this  paper  is  the  use  of  the  subsumption  architecture  to  perform  sensor  fusion  for  the  control  of 
autonomous  ground  vehicles.  General  design  issues  for  autonomous  navigation  of  an  obstacle  course  are  first 
discussed,  with  an  emphasis  on  sensor  fusion.  A  brief  summary  of  the  subsumption  architecture  is  then  given,  with 
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a  descripticiQ  of  the  implementation  of  the  subsumption  ;irchitecture  on  two  different  vehicles.  A  discussion  of 
future  tireas  of  work  is  also  included. 

2.  autonomous  NAVtCATTON  ISSUES 

2.1  Information  needed  for  obstacle  course  navigation 

Mc^my  issues  ru-e  involved  in  designing  an  autonomous  vehicle  to  navigate  an  obstacle  course  of  the  type  established 
by  the  AUVSI  for  the  vehicle  competition.  These  issues  can  be  divided  into  various  functions,  such  as  gathering  all 
necessruy  sensor  data,  processing  sensor  data,  and  applying  control  signals  to  the  vehicle’s  actuators,  or  they  can 
be  divided  into  various  behaviors,  such  as  path  following  and  obstacle  avoidance 

No  matter  how  the  issues  Jire  divided,  the  vehicle  must  be  able  to  detect  the  course’s  boundaries  and  the  multiple 
obstacles  it  may  encounter.  How  the  boundaries  are  defined  and  what  one  wishes  the  vehicle  to  do  with  boundary 
infonnation  dictate  which  technologies  are  candidates  for  boundary  detection.  Computer  vision  can  detect 
boundary  lines  both  near  and  far,  providing  the  vehicle  with  information  about  its  present  location  and  its  possible 
future  locations.  An  infrared  erniiier/detector  pair  can  also  be  used  for  boundary  detection.  This  method  relies  on 
the  reflection  of  an  emitted  signal  and  thus  the  emitter/detector  must  be  in  close  proximity  to  the  line.  Using  this 
method,  it  would  be  very  difficult  to  get  any  knowledge  about  where  the  vehicle  should  go  in  the  distant  future. 

The  vehicle  must  also  be  able  to  detect  obstacles.  Ideally,  proper  detection  should  allow  obstacles  to  be  avoided 
^litogether.  Detecting  obstacles  from  a  longer  distance  is  beneficial,  as  more  time  is  available  to  take  evasive 
action.  Computer  vision  systems  can  be  used  to  detect  the  location  of  obstacles  at  significant  distances,  but 
extensive  image  processing  may  be  required  for  both  obstacle  recognition  and  distance  calculations  to  the  identified 
obstacle.  A  simpler  approach  is  to  use  soundwaves  from  ultrasonic  transducers  to  locate  obstacles.  These  sensors 
have  nice  distance  detection  capabilities,  but  because  they  tend  to  have  limited  fields  of  view,  an  array  of  ultrasonic 
sensors  is  usually  needed  to  provide  sufficient  coverage 

In  the  1996  competition,  collisions  with  an  obstacle  were  acceptable,  but  moving  an  obstacle  was  not.  Tactile 
sensors  can  be  used  to  provide  information  about  collisions.  As  with  the  ultrasonic  sensors,  an  array  of  tactile 
sensors  should  be  used  to  determine  where  on  the  vehicle  the  collision  occurred.  This  allows  corrective  measures 
to  be  taken  to  avoid  hitting  the  obstacle  a  second  time. 

2.2  Sensor  fusion 

Autonomous  navigation  most  likely  will  involve  gathering  and  processing  data  from  a  variety  of  sensor  types.  In 
the  previous  section,  computer  vision,  infrared  emitter/detecior  pairs,  and  ultrasonic  and  tactile  sensors  were 
discussed  in  relation  to  detecting  boundary  lines  and  obstacles  in  the  context  of  obstacle  course  navigation.  In 
general,  laser  rangefinders,  inclinometers,  photosensitive  resistors.  Global  Positioning  Systems  (GPSs),  and  many 
more  devices  may  be  used  to  provide  information  to  a  vehicle.  Combining  ail  of  this  information  in  an  intelligent 
and  organized  fashion  is  called  sensor  fusion. 

It  is  conceivable  that  sensor  fusion  could  be  avoided  altogether  in  obstacle  course  navigation  by  using  only 
computer  vision.  However,  there  are  numerous  disadvantages  to  this  approach: 

•  Computer  vision  is  not  trivial,  as  complex  image  processing  can  be  computationally  intensive 

•  Other  sensing  technologies  may  be  better  at  performing  a  few  specific  tasks,  such  as  tactile  sensors  for  obstacle 
collision  detection 

•  Diverse  and  unpredictable  lighting  conditions  diminish  the  reliability  of  a  computer  vision  system 

•  In  genercd,  using  a  single  sensing  modality  provides  a  single  point  of  failure 

A  system  capable  of  performing  sensor  fusion  addresses  all  of  the  points  listed  above.  Perhaps  the  biggest 
adviinuige  of  such  a  system  is  that  sensor  multiplicity  can  provide  redundancy.  Sensor  uncertainty  and  error  will 


exist  in  autonomous  vehicle  navigation  Redundtincy  helps  to  lessen  the  htinnful  effects  of  sensor 

uncertainty  and  error. 

.Ajiother  advantage  of  sensor  fusion  is  its  inherent  degree  of  inodukuity.  Creating  a  system  that  is  modular  has 
mtiny  benefits  in  both  the  initial  design  stage  and  later  refinements.  Ideally,  as  more  sensor  information  becomes 
available,  this  infonnation  should  be  able  to  fuse  with  existing  data  easily,  without  requiring  large  changes  to 
current  software  tmd  hardware.  Sensor  fusion  techniques  are  based  on  this  ability  to  combine  multiple  modules 
into  a  single  system. 

2.3  Reactive  systems  versus  planning  systems 

The  actions  of  tm  autonomous  vehicle  can  be  the  result  of  two  general  types  of  control  structures:  reactive  or 
plannin,^.  Both  of  these  control  structures  use  sensory  infonnation  to  determine  actions.  A  planning  control 
strategy  ^issuines  that  the  vehicle  has  a  world  model  on  which  to  base  all  of  its  actions.  Information  is  sensed, 
processed  based  on  the  world  model,  and  then  an  action  is  generated.  The  world  model  may  be  known  a  priori^  or 
alternatively,  may  be  developed  from  information  gathered  by  the  vehicle's  sensory  inputs,  A  reactive  system 
differs  in  that  it  uses  no  world  models  and  its  sensory  inputs  are  subjected  to  minimal  processing  before  generating 
outputs,  leading  to  a  quick  response  from  a  given  stimulus. 

In  most  cases  of  autonomous  navigation,  a  detailed  world  model  will  not  be  known  a  priori.  If  a  planning  system 
is  to  be  used,  a  model  of  the  world  must  be  built  from  sensory  inputs.  Unfortunately,  sensory  systems  can  typically 
only  provide  a  partial  picture  of  the  environment  Sensory  information  gathered  is  also  inherently  noisy,  as 
sensors  may  not  be  operating  at  ail  times  under  ideal  conditions.  This  can  lead  to  inaccurate  world  models  and 
poor  performance  in  a  planning  system. 

Reactive,  or  behavior-based,  systems  do  not  require  the  development  of  a  world  model  and  are  relatively  simple 
compared  to  planning  systems.  Participants  in  previous  competitions  have  proposed  using  reactive  systems  due  in 
part  to  the  relative  simplicity  of  such  designs  The  behavior-based  approach  is  also  ideally  suited  to  obstacle 
course  navigation,  as  the  task  can  be  decomposed  into  distinct  behaviors  such  as  path  following,  line  following, 
obstacle  avoidance,  etc. 


3.  SUBSUMPTION  ARCHITECTURE  OVERVIEW 


The  subsumption  architecture  is  a  behavior-based  control  scheme  developed  for  controlling  autonomous  mobile 
robots  Control  of  an  autonomous  mobile  robot  will  inevitably  involve  multiple  goals,  some  of  which  may 
require  conflicting  actuator  demands.  Any  successful  architecture  must  have  a  means  of  arbitrating  these 
conflicting  demands.  The  subsumption  architecture  does  this  by  taking  a  behavior-based  approach  to  decomposing 
the  entire  control  problem. 

A  typical  approach,  as  shown  in  Figure  3.1,  is  to  break  down  the  problem  into  a  series  of  functional  units,  where 
each  unit  performs  a  specific  function.  For  an  autonomous  vehicle  to  navigate  an  obstacle  course,  these  functions 
may  include  gathering  sensor  data,  processing  sensor  data,  making  logical  steering  and  drive  decisions,  and 
applying  control  commands  to  steering  and  drive  actuators.  There  are,  however,  multiple  disadvantages  to  this 
type  of  architecture: 

•  The  details  of  every  module  must  be  considered  before  the  construction  of  any  individual  module  can  proceed. 

•  Module  interfaces  are  extremely  important  because  of  the  moduie-to-module  flow  of  information. 

•  A  problem  in  one  module  can  lead  to  a  cascade  failure  in  other  modules  that  depend  on  the  output  of  the 
damaged  module. 

The  subsumption  architecture  takes  an  alternate  approach  by  decomposing  the  problem  into  parallel,  task- 
achieving  behaviors,  as  shown  in  Figure  3.2.  Individual  behaviors  are  generated  from  modules,  each  of  which  is  a 
simple,  bisynchronous  computational  machine.  An  entire  control  system  is  then  constructed  using  layers  of  these 


behavior  modules.  The  vehicle  achieves  a  certain  level  of  competence  for  each  layer  in  its  control  system.  As  more 
modules  tu-e  added,  the  overall  level  of  competence  of  the  vehicle  increases.  Higher  layers  produce  more  specific 
desired  behaviors  and  can  subsume  lower  layers  by  suppressing  the  lower  layers’  outputs.  For  autonomous  obstacle 
course  navigation,  these  behaviors  may  include  path  following,  obsuacle  avoidance,  and  reaction  to  obstacle 
collisions.  This  style  of  architecture  addresses  all  of  the  problems  associated  with  the  series  architecture  of  Figure 
3.1:  modules  i\ic  constructed  individually,  module  interfacing  is  not  elaborate,  and  failures  in  one  module  have  a 
minimal  affect  on  other  modules. 


Figure  3.1  A  series  of  functional  modules  for  obstacle  course  navigation 
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Figure  3.2  Parallel  behavior  modules  for  obstacle  course  navigation 

The  subsumption  architecture  essentially  involves  fixed  priority  assignments  between  behaviors.  The  commands 
belonging  to  the  highest  priority  behavior  are  applied  to  the  robot's  actuators.  The  designer  of  such  a  system  must 
decide  on  the  number  of  behavior  levels  and  what  priority  to  assign  to  each  level.  The  ordering  of  the  modules  in 
Figure  3.2  shows  one  possible  behavior  decomposition  and  hierarchy  for  the  obstacle  course  navigation  problem. 

4.  VEHICLE  DESICfNS 


Past  competitors  have  designed  vehicles  using  computer  vision  to  detect  the  course's  boundaries  and  ultrasonic 
sensors  to  detect  the  hay  bale  obstacles  The  Virginia  Tech  Autonomous  Vehicle  Team  used  a  similar 

approach  in  designing  two  vehicles,  BOB  (Beast  of  Burden)  and  CALVIN  (Computerized  Autonomous  Land 
Vehicle  with  Intelligent  Navigation). 

Figure  4.1  provides  an  overhead  view  of  the  two  vehicles,  showing  their  relative  sizes  and  the  positions  of  the 
cameras  and  ultrasonic  sensors.  BOB  used  a  personal  computer  with  a  frame  grabber  card  to  interface  with  a 
single  camera,  which  was  positioned  near  the  middle  of  the  vehicle  and  directed  forward.  CALVIN  was  equipped 
with  two  cameras,  each  looking  directly  out  to  opposite  sides  of  the  vehicle.  These  cameras  were  positioned  near 
the  front  of  the  vehicle.  An  electronic  video  switch  was  used  to  connect  either  one  of  these  two  cameras  to  a  frame 
grabber  card  in  a  personal  computer.  Both  vehicles  used  a  Motorola  68HCll-based  microcontroller  to  interface  to 
iin  luray  of  five  ultrasonic  sensors  and  to  three  tactile  sensor  banks.  The  positions  of  the  ultrasonic  sensors  were 
different  for  each  vehicle  due  to  the  different  vehicle  widths  and  the  different  bumpers  on  each  vehicle.  The  three 
tactile  sensor  banks  were  connected  to  the  three  segments  of  the  front  bumpers  on  each  vehicle. 
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Figure  4.1  Top  view  showing  relative  sizes  and  configurations  of  the  two  vehicles 

Figure  4.2  shows  a  block  diagram  of  the  control  systems  for  the  Virginia  Tech  autonomous  vehicles.  Three 
different  sensing  modalities  were  combined  into  reference  signals  using  the  subsumption  architecture  to  control  a 
steering  motor  ^md  a  drive  motor.  Sensor  fusion  is  inherent  in  the  subsumption  architecture,  where  outputs 
resulting  from  one  sensor  can  be  subsumed  by  outputs  resulting  from  another  sensor.  A  previous  competitor  has 
stated  that  obstacle  course  navigation  is  ideally  suited  to  subsumption  architecture  because  the  task  can  be  broken 
down  into  behaviors  such  as  stay  within  the  lines,  follow  a  specific  line,  and  avoid  obstacles. 

Figure  4.3  shows  the  sensor  fusion  block  of  Figure  4.2.  The  bottom  layer  was  the  Path  Following  module,  which 
used  computer  vision  to  create  a  path  following  behavior.  The  middle  layer  was  the  Obstacle  Avoidance  module, 
which  used  ultrasonic  sensors  to  avoid  obstacles.  At  the  highest  level  was  the  Emergency  Obstacle  Avoidance 
module,  where  tactile  sensors  were  used  to  trigger  an  emergency  response  if  an  obstacle  was  hit. 
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Figure  4.2  Autonomous  vehicle  control  system 
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Figure  4.3  Subsumption  architecture  approach  to  sensor  fusion 


Sensor  fusion  wns  accomplished  on  the  microcontroller.  The  Path  Following  module  was  realized  on  the  personal 
computer,  with  its  output  sent  to  the  microcontroller  via  a  serial  connection.  The  Obstacle  Avoidance  and 
Emergency  Obstacle  Avoidance  modules  operated  solely  on  the  microcontroller.  Software  on  the  microcontroller 
allowed  the  middle  layer  to  subsume  the  lowest  layer.  The  interrupt  capabilities  inherent  in  the  hardwm-e  of  the 
microcontroller  were  used  to  billow  the  highest  layer  to  subsume  the  two  lower  layers’  outputs.  The  next  three 
subsections  of  this  chapter  provide  additional  details  about  each  behavior  module. 

Figures  4.4  ^ind  4.5  show  schematic  diagrams  of  the  integration  of  the  personal  computer  and  microcontroller  with 
the  sensors  and  conu-ollers  for  the  two  vehicles.  BOB’s  drive  motor  controller  performed  closeddoop  velocity 
conu‘01,  eliminating  the  need  for  any  velocity  feedback  to  the  microcontroller  in  Figure  4.5. 


Figure  4.4  Computer  integration  of  sensing  and  control  for  CALVIN 


Figure  4.5  Computer  integration  of  sensing  and  control  for  BOB 
4.1  Boundary  detection  through  computer  vision:  Path  Following  module 

On  both  vehicles,  the  input  to  this  module  was  a  camera  image  and  the  output  was  a  preliminary  steering  angle  for 
keeping  the  vehicle  within  the  boundaries  of  the  course.  The  internal  operations  of  this  module,  however,  were 
significantly  different  between  vehicles  because  each  used  a  different  camera  configuration. 

On  BOB,  there  was  just  a  single  camera  directed  straight  ahead.  Since  one  would  expect  boundary  lines  to  extend 
up  the  image  pLme,  a  weighted  sum  of  pixels  in  each  column  was  taken,  with  the  goal  being  to  find  two  modes  in 


this  column  data.  These  modes,  representing  cin  estunate  of  the  boundary  locations,  were  used  to  create  a 
preliminary  steenng  tmgle  that  moved  the  vehicle  towtird  the  horizontal  average  of  the  two  modes.  The  horizontal 
average  tipproximated  the  center  of  the  course. 

On  C.\LVIN,  tw’o  ciuneras  were  used,  each  directed  outward  at  a  right  angle  to  the  vehicle’s  forward  direction. 
One  would  expect  boundary  lines  to  appear  as  horizontal  lines  at  a  certain  vertictil  position  in  these  images.  If  the 
detected  line  wiis  not  horizontal.  CALVIN  was  not  parallel  to  the  boundary  line.  Similarly,  if  the  line  wtis 
horizontal  but  not  at  the  correct  vertical  position,  then  CALVIN  was  either  too  close  or  too  far  away  from  the 
boundary.  A  steering  angle  wtis  chosen  to  properly  orient  the  vehicle  based  upon  the  angle  and  vertical  location  of 
the  detected  line.  If  consecutive  images  were  bad  from  one  camera,  a  signal  was  sent  to  the  microcontroller  to  have 
it  switch  cmneras,  providing  a  degree  of  redundancy. 

4.2  Obstacle  detection  through  ultrasonic  sensors:  Obstacle  Avoidance  module 

On  both  vehicles,  five  ultrasonic  sensors  located  in  the  front  of  the  vehicle  were  used  to  gathered  distance  data  to 
the  obstacles.  A  collection  of  IF-THEN  rules  was  used  to  analyze  the  distance  values  returned  by  the  sensors. 
These  rules  determined  the  steering  angle  needed  to  avoid  any  obstacles  in  the  vehicle’s  path.  On  BOB.  the  output 
of  this  module  was  a  steering  angle  that  absolutely  replaced  (subsumed)  the  preliminary  heading  from  the  Path 
Following  module.  On  CALVIN,  the  output  was  either  an  absolute  steering  angle  or  a  "steering  delta”  that  was 
added  to  the  preliminary  heading  from  the  Path  Following  module.  The  resulting  sum  would  then  be  the  final 
steering  angle.  This  part  of  the  sensor  fusion  was  handled  in  software  by  the  microcontroller. 

4.3  Obstacle  detection  through  tactile  sensors:  Emergency  Obstacle  Avoidance  module 

Tactile  sensors  placed  on  the  front  bumpers  of  both  vehicles  provided  obstacle  collision  detection.  Using  input 
capture  pins  on  the  microcontroller,  interrupt  service  routines  (ISRs)  were  executed  in  response  to  a  signal  from 
tiny  of  three  banks  of  wired-OR  tactile  sensors.  These  ISRs  effectively  subsumed  the  outputs  of  the  Path  Following 
iind  Obstacle  Avoidance  modules  by  forcing  the  vehicle  to  execute  a  "backing  up”  maneuver.  This  part  of  the 
sensor  fusion  was  htmdled  easily  because  of  the  ISR  capabilities  associated  with  the  input  capture  pins  of  the 
microcontroller. 

4.4  Summary  of  the  vehicles’  performance 

The  subsumption  architecture  proved  to  be  a  good  method  for  achieving  sensor  fusion.  The  inherent  modularity  of 
this  architecture  made  it  an  ideal  choice  for  Virginia  Tech’s  initial  experiences  in  developing  autonomous  ground 
vehicles.  The  various  behavior  modules  were  developed,  tested,  and  refined  independently,  greatly  reducing  the 
effort  required  to  fully  integrated  all  sensor  systems  of  the  autonomous  vehicles.  At  the  competition  in  July,  BOB 
iind  CALVIN  placed  6th  and  I2th,  respectively,  out  of  17  participants. 

The  biggest  problems  encountered  in  performance  trials  were  presented  by  shadows  and  glare  from  the  sun.  The 
computer  vision  system  on  each  vehicle  had  difficulty  extracting  the  boundary  lines  from  a  captured  image  when 
these  tutifacts  were  present.  In  both  circumstances,  the  image  preprocessing  required  could  not  sufficiently  remove 
the  lutifact  while  tracking  the  boundary  lines.  Both  vehicles  performed  well  when  the  sun  was  not  shining 
brightly.  In  this  respect,  the  subsumption  architecture  was  shown  to  work  when  the  Path  Following  module  could 
successfully  preprocess  the  captured  images.  This  also  emphasizes  the  importance  of  the  bottom  layer  in  a 
subsumption  architecture.  This  behavior  should  be  the  result  of  a  very  robust  and  fully  operational  module. 

5.  FUTURE  WORK 

The  control  systems  on  the  vehicles  performed  well,  but  modifications  are  already  being  considered  to  improve  the 
overall  performance  of  the  vehicles.  Future  work  will  focus  on  three  areas:  1)  refining  the  three  individual 
modules  of  the  subsumption  network  shown  in  Figure  4.3,  2)  modifying  the  sensor  inputs  to  these  modules,  and  3) 
adding  additional  modules  to  the  network. 


5.1  Modifications  within  current  modules 

Within  the  Path  Following  module,  modifications  have  been  proposed  to  use  more  sophisticated  computer  vision 
algorithms  for  line  detection.  The  use  of  anificial  neural  networks  (ANNs)  for  pattern  recognition  is  one  option 
being  explored.  ANNs  have  the  ability  to  extract  underlying  geometric  shapes  from  a  noisy  image,  providing  a 
degree  of  disturbance  rejection  that  could  counter  the  harmful  effects  of  shadows  c^md  sun  glare.  Gathering  training 
data  that  is  truly  represenuative  of  the  numerous  lighting  conditions  that  may  be  encountered  in  competition  could 
pose  a  problem  for  this  approach. 

The  collection  of  IF-THEN  rules  used  within  the  Obstacle  Avoidance  module  could  also  be  modified.  Rather  than 
using  a  hu-ge  collection  of  crisp  rules,  a  fuzzy  rule-base  has  been  proposed.  The  output  of  this  system  would  still  be 
a  steering  delta  or  an  absolute  steering  angle,  but  only  a  small  number  of  fuzzy  rules  would  be  used.  From  this 
small  fuzzy  system,  a  wide  vtiriety  of  responses  would  be  generated.  The  possibility  also  exists  to  train  a  system  to 
learn  a  set  of  fuzzy  rules  that  can  most  effectively  handle  the  situations  encountered  in  obstacle  course  navigation. 

5.2  Sensory  device  modifications 

Alternate  sensory  devices  for  the  behavior  modules  of  Figure  4.3  are  also  being  considered.  The  use  of  a  color 
camera  to  provide  the  input  to  the  Path  Following  module  is  being  considered  to  address  the  problems  associated 
with  sun  glare.  Grass  without  any  paint  was  seen  to  reflect  sunlight  as  much  as  blades  of  grass  that  had  been 
ptiinted  white  (or  similarly  covered  with  lime  powder).  With  a  color  camera,  the  intense  green  reflections  could  be 
filtered  out  while  leaving  the  intense  white  reflections  from  the  lines. 

5.3  Additional  modules 

Line  detection  in  an  outdoor  setting  using  computer  vision  is  not  a  trivial  task.  It  proved  to  be  the  system 
component  most  prone  to  failure  in  testing  and  performance  trials.  Therefore,  the  addition  of  a  second  module  for 
boundary  detection  is  planned.  The  detector  used  with  this  module  would  function  at  short  range  and  would  serve 
as  .'m  indication  that  the  vehicle  is  very  close  to  a  course  boundary.  The  position  of  the  Emergency  Path  Following 
module  in  Figure  5.1  shows  the  relative  priority  of  the  behavior  generated  from  this  module. 
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Figure  S.l  Subsumption  architecture  with  additional  module  providing  redundant  line  detection 
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Abstract 

In  this  paper,  we  present  a  new  distributed  artificial 
intelligence  (DAI)  architecture  we  call  the  Distributed 
Learning  Classifier  System  (DLCS).  The  DLCS  is  an 
extension  of  the  learning  classifier  system  (LCS),  with 
specific  architecture  additions  for  network  message  passing. 
In  order  to  illustrate  the  effectiveness  of  the  DLCS  paradigm 
in  multiple-agent  scenarios,  we  provide  a  solution  to  the 
multiple-agent  animat  problem  using  the  DLCS. 

L  Introduction 

The  term  “machine  learning”  is  used  to  describe  a 
vast  array  of  algorithms,  methods,  and  paradigms  which 
attempt  to  solve  tasks  that  run  the  gamut  from  speech 
recognition  to  financial  analysis  to  robot  control.  Machine 
learning  paradigms  tend  to  fall  into  one  of  three  categories: 
neural  modeling,  symbolic  concept  acquisidon,  or  domain- 
specific  learning  [1].  The  neuron-like  networks  present  in 
neural  modeling  techniques  provide  fast,  reladvely  simple 
mappings  from  input  to  outpuL  Symbolic  concept 
acquisidon  encompasses  a  broad  range  of  artificial 
intelligence  methods  in  which  a  machine  solves  a  task  by 
learning  previously  unconnected  concepts  using  a 
predefined  symbolic  noiadon.  Domain-specific  learning 
involves  a  large  amount  of  a  priori  knowledge  about  the 
task,  and  these  algorithms  focus  on  using  an  extensive 
knowledge-base  to  solve  the  problem  [2].  In  attempting  to 
solve  a  particular  task,  one  would  often  prefer  an  approach 
that  provides  a  compromise  between  these  three  techniques. 
The  Learning  Classifier  System  (LCS),  proposed  by  John 
Holland  [3,4],  provides  such  a  compromise. 


The  LCS  is  a  rule-based,  message-passing,  machine 
learning  paradigm  designed  to  process  task-environment 
stimuli,  much  like  the  input-to-output  mapping  provided  by 
a  neural  network.  In  addition  to  neural-like  mapping,  the 
LCS  provides  learning  through  genetic  and  evolutionary 
adaptation  to  changing  task  environments.  LCS  concepts 
are  “subsymbolic,”  meaning  that  they  are  encoded  by  the 
system  itself  and  not  the  designer.  The  LCS  can  still  be 
programmed  with  specific  domain  knowledge,  however,  and 
this  duality  provides  much  design  flexibility  [5]. 

The  purpose  of  this  paper  is  to  introduce  the 
Distributed  Learning  Classifier  System  (DLCS)  architecture 
as  a  method  for  using  the  LCS  in  a  multiple-agent  setting. 
As  an  extension  of  the  traditional  learning  classifier  system, 
the  DLCS  provides  a  set  of  rules  for  interfacing  to  a 
standard  network  so  that  multiple  LCS  agents  can  work 
collectively,  while  still  executing  individually.  Collective 
task  solution  is  at  the  heart  of  distributed  artificial 
intelligence  (DAI)  research  [6].  However,  the  DLCS  differs 
from  other  DAI  architectures  in  that  it  requires  no  central 
control,  and  is  therefore  more  suited  for  tasks  involving 
multiple,  autonomous  agents.  Also,  since  the  DLCS  uses  a 
network-like  message-passing  scheme,  standard  network 
protocols  can  be  used  to  connect  agents.  We  begin  our 
discussion  by  providing  a  very  brief  overview  of  the 
traditional  learning  classifier  system  and  the  current  research 
in  the  area.  We  then  present  a  detailed  discussion  of  the 
DLCS  and  illustrate  its  application  to  a  multiple-agent 
problem. 

IL  Background 

The  learning  classifier  system  consists  of  a  list  of 
rules  or  classifiers  that  provide  a  set  of  possible  actions  for  a 
given  problem  scenario.  Each  rule  has  one  or  more 
condition  words  and  an  action  word.  The  system  operates 
by  reading  messages  from  a  task  environment  interface, 
comparing  those  messages  to  the  conditions  of  the  rules  in 
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the  classifier  list,  and  posting  the  corresponding  action 
messages  back  to  the  task  environment.  This  sequence  of 
operations  is  referred  to  as  an  execution  cycle  and  is 
repeated  until  a  particular  environment  state  is  reached. 
Messages  from  and  to  the  environment  are  stored  on  a 
message  board.  Since  the  message  board  is  of  finite  length, 
rules  are  probabilistically  selected  using  a  bidding  process 
based  on  a  figure  of  merit  called  strength.  Each  rule  has  a 
strength  assigned  to  it,  and  rules  whose  conditions  match  the 
environment  messages  are  given  a  bid  value  based  on  this 
strength.  Rule  strengths  are  adjusted  by  the  task 
environment  pay ojf  function  based  on  the  appropriateness  of 
the  rule  to  solving  the  task  at  hand.  Rule  strengths  are  also 
adjusted  by  the  Bucket  Brigade  Algorithm  (SSA),  an 
algorithm  designed  to  encourage  rule  chaining.  Rule  chains 
form  when  an  action  message  from  the  previous  iteration  of 
the  execution  cycle  causes  an  action  to  get  selected  on  the 
current  iteration.  In  order  to  provide  a  method  for  exploring 
new  rules,  the  rules  in  the  classifier  list  are  periodically 
modified  by  a  genetic  algorithm  (GA)  which  employs  the 
reproduction  and  mutation  generic  operators.  The 
combination  of  the  GA  {rule  discovery)  and  rule  strength 
adjustment  {credit  assignment)  enable  the  LCS  to  leam  new 
concepts.  Credit  assignment  and  rule  discovery  are 
performed  after  actions  are  posted  to  the  environment  For  a 
more  detailed  discussion  of  the  LCS,  see  [3]  and  [4]. 

Holland  began  development  of  the  LCS  in  1971 
[7].  Since  then  it  has  undergone  modest  experimentation. 
Some  of  the  more  recent  work  includes  the  application  of 
the  LCS  to  letter  sequence  prediction  by  Robertson  and 
Riolo  [8],  multiplexer  truth  function  learning  by  Wilson  [9], 
predictive  behavior  learning  by  Carse  [10],  environment 
variable  storage  by  Shu  and  Schaeffer  [11],  and  learning  by 
analogy  by  Zhou  and  Grefenstette  [12].  Each  of  these  tasli 
focused  on  the  use  of  a  single  LCS.  Dorigo  and  Schnepf,  on 
the  other  hand,  explored  the  use  of  multiple  learning 
classifier  systems  in  robotics,  with  each  system  controlling  a 
different  aspect  of  the  robot.  While  they  show  that  this 
control  approach  is  effective,  their  setup  does  not  explicitly 
pass  messages  between  classifier  systems  [13].  The  DLCS 
presented  here  goes  beyond  current  learning  classifier 
system  research  by  introducing  true  networking  into  the  LCS 
framework. 

m.  The DLCS 

A.  Overview 

The  DLCS  extends  the  standard  learning  classifier 
system  with  the  addition  of  a  network  interface,  as  shown  in 
Figure  1.  The  figure  indicates  message  flow  with  solid 
arrows  and  system  control  with  dashed  arrows.  The 
execution  cycle  operates  by  reading  messages  from  the 
environment  through  the  input  interface,  selecting 
appropriate  actions  from  the  classifier  list,  and  posting 
actions  back  to  the  environment  via  the  output  interface. 
With  the  DLCS,  however,  we  extend  this  message  passing  to 


the  network.  Now,  messages  can  come  from  the  input 
interface  and  the  network  interface  and  actions  can  get 
posted  to  the  output  interface  and  to  the  network.  The 
network  can  also  introduce  new  rules  into  the  classifier  list. 
We  describe  in  detail  these  two  cases,  as  well  as  a  third 
message-passing  case,  in  the  next  two  sections.  Since 
networks  are  asynchronous  and  tend  to  have  a  delivery  delay 
that  depends  on  external  factors  such  as  network  congestion, 
available  bandwidth,  and  transmission  rate,  the  network 
interface  contains  transmit  and  receive  queues  for  the 
purpose  of  buffering  network  messages  so  that  the  LCS  does 
not  have  to  wait  for  network  response  to  continue  execution. 
These  queues  are  simply  FIFO  buffers  which  store  outgoing 
and  incoming  messages. 


Figure  1.  The  DLCS 


B.  Network  Message  Types 

In  order  to  provide  versatile  and  effective 
communication  over  the  network,  the  DLCS  paradigm 
provides  three  types  of  messages  that  can  be  passed  over  the 
network:  classifier  messages,  action  messages,  and  BBA 
strength  adjustment  messages.  Classifier  message  passing 
occurs  when  an  agent  finds  a  useful  rule  and  shares  this  rule 
with  the  other  agents.  Action  message  passing  occurs  when 
an  agent  periodically  sends  one  or  more  of  its  selected 
actions  to  other  agents  in  the  network.  Finally,  since  agents 
are  passing  action  messages  back  and  forth,  rule  chains  can 
form  between  agents,  and  BBA  strength  adjustment 
messages  must  be  sent  between  them. 

Each  type  of  message-passing  has  different 
implications  for  the  DLCS.  Classifier  message  passing 
allows  one  agent  to  share  its  learned  knowledge  with  others. 
As  will  be  shown  in  the  next  section,  only  classifiers  with 
higher  strengths  can  be  passed,  thereby  ensuring  that  only 
“good”  rules  are  shared  between  agents.  Therefore,  if  one 
agent  has  learned  part  of  a  task  more  quickly  than  its  fellow 
agents,  that  agent  will  share  this  learned  information  and 
help  accelerate  the  learning  process  in  the  other  agents. 


Action  message  passing,  on  the  other  hand,  provides  a 
method  for  one  agent  to  directly  “talk”  to  the  others.  These 
received  network  action  messages  combined  with 
environment  interface  messages  work  together  to  “fire”  rules 
in  the  classifier  list.  The  BBA  payoff  messages  perform  the 
same  function  as  in  the  single-LCS  case  by  encouraging 
chains  to  form  between  agents,  thereby  encouraging 
“discussion”  among  agents. 

C.  DLCS  Execution  Cycle 

The  following  is  the  execution  cycle  for  the 
distributed  learning  classifier  system.  The  execution  cycle  is 
based  on  the  standard  LCS  cycle,  with  items  in  italics 
representing  DLCS  additions. 

1.  From  the  input  interface,  read  the  messages  from  the 
environment  and  post  them  on  the  message  board. 
From  the  receive  queue,  read  the  action  messages  from 
other  agents  and  post  them  on  the  message  board.  Also 
read  the  classifier  messages  from  other  agents  and 
probabilistically  replace  weak  rules  in  the  classifier  list 
with  these  new  rules, 

2.  Compare  the  messages  on  the  message  board  with  each 
classifier.  Record  a  match  for  every  classifier  whose 
condition  words  have  all  been  matched  by  these 
messages. 

3.  Calculate  bids  for  each  matching  classifier. 
Probabilistically  select  classifiers  to  post. 

4.  Clear  the  message  board  and  post  the  actions  of  the 
selected  classifiers. 

5.  Send  the  messages  on  the  message  board  to  the  output 
interface.  Send  a  subset  of  these  messages  and/or  a 
fixed  number  of  high-strength  classifiers  to  the  transmit 
queue, 

6.  Adjust  the  strengths  of  classifiers.  Extract  and  process 
any  BBA  strength  adjustment  messages  from  the  receive 
queue,  and  send  BBA  payoff  messages  to  the  transmit 
queue  as  necessary. 

Steps  1  and  5  of  this  execution  cycle  are  responsible  for 
reception  and  transmission  of  classifier  and  action  network 
messages,  respectively.  We  must  establish  rules  for 
transmission  and  reception  so  that  the  number  of  messages 
sent  over  the  network  can  be  controlled  and  so  that  one  can 
control  the  amount  of  influence  agents  have  upon  one 
another.  We  will  discuss  transmission  first. 

Transmission  of  action  and  classifier  messages 
occurs  on  step  5  of  the  execution  cycle.  We  state  above  that 
a  “subset”  of  the  selected  actions  should  be  sent  This 
subset  is  defined  by  two  variables,  the  transmit  bid 
threshold,  Bjx  *  and  the  maximum  number  of  actions  to 
transmit  NTx.action-  The  transmit  bid  threshold  defines  the 
minimum  bid  value  required  before  an  action  is  eligible  to 
be  sent  over  the  network.  By  imposing  a  bid  threshold,  only 
those  actions  whose  posting  classifiers  have  a  high  strength 
will  be  sent  since  the  bid  is  based  on  the  strength.  NTx.acrioH 


provides  a  way  to  control  the  amount  of  network  traffic  by 
imposing  an  upper  limit  on  the  number  of  actions  that  can  be 
sent,  in  case  all  of  the  actions  have  large  bids.  is 

smaller  than  the  number  of  actions  eligible  to  be  transmitted, 
then  the  actions  with  the  largest  bids  are  sent. 

Transmission  of  classifiers  on  step  5  is  also 
governed  by  two  variables,  the  transmit  strength  threshold, 
Stk  .  and  the  maximum  number  of  classifiers  to  transmit, 
^TX.ciassitier^  The  transmit  strength  threshold  functions  like 
the  bid  threshold  in  dictating  a  minimum  strength  required 
before  a  classifier  can  be  transmitted.  This  threshold 
ensures  that  only  “good”  rules  are  sent  over  the  network. 
^Tx.ciassifier  also  helps  contTol  the  amount  of  network  traffic 
by  limiting  the  number  of  classifiers  that  can  be  sent,  since 
classifier  lists  are  often  rather  large.  These  variables 
collectively  influence  the  degree  of  coupling  in  the  multi¬ 
agent  application.  Coupling  should  be  “tight”  enough  that 
collective  behaviors  emerge,  but  no  so  tight  that  network 
bandwidth  is  threatened. 

We  should  point  out  at  this  point  that  action  and 
classifier  messages  are  broadcast  over  the  network  to  all 
other  agents.  As  will  be  shown,  each  agent  has  the 
opportunity  to  discard  received  messages.  Also, 
transmission  of  these  two  message  types  is  globally  paced  by 
two  time  intervals:  an  action  transmission  interval,  lacnon 
and  a  classifier  transmission  interval  Idasstfier^  If  we  define  a 
unit  of  time  as  one  iteration  of  the  execution  cycle,  these 
intervals  determine  how  many  iterations  occur  between 
network  transnaissions.  Larger  intervals  provide  less  inter¬ 
agent  communication  and  therefore  result  in  less  coupling 
between  agents.  Smaller  intervals  provide  more  coupling. 

Reception  of  action  and  classifier  messages  occurs 
on  step  1  of  the  execution  cycle.  Since  both  action  and 
classifier  messages  are  broadcast,  there  will  in  general  be 
more  messages  in  the  receive  queue  than  were  transmitted. 
Agents  must  have  a  method  of  filtering  out  the  best 
messages  from  all  those  received.  As  with  transmission, 
action  and  classifier  message  types  are  handled  separately. 

Action  message  reception  is  governed  by  two 
variables  that  are  effectively  the  converse  of  the 
transmission  variables,  the  receive  bid  threshold,  B^x  ,  and 
the  maximum  number  of  actions  to  receive,  N^xacnon  •  These 
variables  are  separate  from  their  corresponding  transmit 
variables  because  we  may  wish  to  put  more  stringent 
requirements  on  reception  than  transmission,  since  recepdon 
will  have  an  impact  on  an  agent's  behavior.  In  other  words, 
an  agent  can  be  more  liberal  in  its  sharing  of  informadon, 
while  putting  a  higher  premium  on  the  usefulness  of 
received  informadon.  To  achieve  this  end,  one  generally 
uses  Brx  >  Btx  and  Nj^xacacn  <  NTXaetion  •  Note  that  those 
received  action  messages  that  are  not  used  are  discarded. 

Classifier  message  recepdon  is  also  controlled  by 
two  variables,  the  receive  strength  threshold,  Srx  »  and  the 


maximum  number  of  classifiers  to  receive,  N Rx.dassij^tr  . 
These  variables  perform  the  same  function  as  in  action 
message  reception.  Again,  we  want  the  reception 
requirements  to  be  more  smngent  because  each  rule  that  an 
agent  accepts  will  replace  a  weaker  rule  in  the  classifier  list. 
The  more  network-based  rules  accepted,  the  more  agent 
rules  replaced.  While  rule  replacement  is  not  a  necessarily  a 
detrimental  occurrence,  we  want  to  ensure  that  only  “bad” 
rules  are  replaced.  Again,  unused  classifier  messages  are 
discarded.  Note  that  there  are  no  receive  intervals  for  action 
or  classifier  reception  since  the  frequency  of  reception  is 
dependent  on  the  frequency  of  transmission. 

There  are  no  restrictions  on  BBA  payoff  message 
transmission  and  reception,  since  this  message  passing  is 
governed  by  the  bucket  brigade  algorithm.  The  BBA 
dictates  that  supplier  classifiers  should  be  paid  if  they  post 
an  action  message  that  fires  one  of  the  current  classifiers.  If 
these  suppliers  happen  to  be  in  another  agent^s  classifier  list, 
a  BBA  payoff  message  is  sent  over  the  network  to  that 
classifier.  Note  that  in  the  BBA  payoff  message  case, 
messages  are  not  broadcast;  they  are  sent  directly  to  the 
agent  getting  paid.  BBA  transactions  occur  on  execution 
cycle  step  6. 

Finally,  depending  on  the  task  to  which  the  DLCS 
paradigm  is  applied,  one  may  wish  to  disable  action  passing 
or  classifier  passing.  The  DLCS  has  been  designed  so  that 
action  and  classifier  passing  work  independently,  and 
therefore  disabling  one  does  not  change  the  operation  of  the 
other.  Also,  disabling  the  bucket  brigade  algorithm  disables 
BBA  message  passing. 

rv.  Multiple-Agent  DLCS  Example 

As  an  example  of  the  DLCS,  we  introduce  the 
multiple-agent  animat  problem.  An  animat,  or  “artificial 
animal,”  is  an  extremely  simple  autonomous  robot  modeled 
after  an  animal  [9}.  The  “animat  problem”  describes  the 
autonomous  robot’s  search  for  a  goal  in  an  obstacle-filled 
environment.  We  extend  this  animat  problem  to  include 
multiple  autonomous  robots,  each  attempdng  to  reach  the 
same  goal.  We  will  show  that  the  DLCS  paradigm  allows 
these  animats  to  reach  the  goal  faster  and  more  efficiently 
than  animats  with  standard  LCS  controllers. 

We  model  a  simple  autonomous  agent  as  a  treaded 
robot  subject  to  the  nonholonomic  constraint — the  robot’s 
velocity  is  limited  by  its  position,  thereby  requiring  that  the 
robot  travel  only  in  the  direction  in  which  it  is  pointed.  The 
robot  is  equipped  with  left  and  right  goal  and  obstacle 
sensors,  with  the  goal  sensors  having  a  much  larger  range 
than  the  obstacle  sensors.  However,  the  robot  does  not 
understand  its  sensor  or  kinematic  systems;  it  must  learn 
reladonships  between  sensors  and  morion  from  credit 
assignment  Credit  assignment  is  accomplished  by 
evaluating  an  agent’s  action  on  each  time  step.  The  agent  is 
rewarded  for  moving  closer  to  the  goal  and  being  more 


directly  pointed  at  the  goal.  The  agent  is  penalized  for 
moving  away  and  “looking  away”  from  a  goal  and  for 
crashing  into  an  obstacle.  The  environment  consists  of  an 
infinite  plane  with  two  long  obstacles  and  a  goal.  The 
agents  start  behind  both  obstacles,  such  that  the  obstacles 
obstruct  the  most  direct  path  to  the  goal. 

The  purpose  of  this  simulation  is  to  illustrate  the 
effect  of  LCS  network  distribudon.  Therefore,  we  have 
disabled  the  Generic  Algorithm  and  the  BBA  in  the  standard 
LCS.  For  our  DLCS  settings,  we  use  classifier  message 
passing  only,  with  Stk  =  Sfcc  =  80%  of  the  maximum  strength 
on  the  classifier  list  at  the  time  of  transmission  and 
reception,  respectively.  Nvc.ciassiper  =  ^Bx.ciassifier  =  1  and 
^Classifier  =  L  We  use  two  agents  in  the  animat  scenario,  and 
each  agent’s  classifier  list  is  initialized  with  random  rules. 


Two  Agents  with  DLCS  Control 
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Figure  2.  Simulation  results  comparing  the  DLCS  with 
the  standard  LCS, 


We  have  included  the  results  of  two  simulations  in 
Figure  2.  In  the  first,  the  DLCS  architecture  is  used  for  each 
agent.  In  the  second,  agents  use  the  standard  LCS 
architecture.  In  each  case,  the  agents’  classifier  lists  are 
initialized  to  the  same  identical  set  of  random  rules.  As  can 
be  seen  from  the  figure,  the  agents  are  much  more  successful 
in  maneuvering  around  the  obstacles  and  reaching  the  goal 
when  using  the  DLCS  paradigm  than  when  they  are  given 
traditional  LCS  controllers  with  no  inter-agent 
communication.  Because  classifier  passing  occurs  on  every 
iteration  of  each  agent’s  execution  cycle,  agents’  classifier 
lists  tend  to  quickly  converge  to  the  most  useful  rule,  in  this 
case,  a  rule  which  moves  the  agents  toward  the  goal. 
Without  the  DLCS  architecture,  agents  are  left  to  “fend  for 
themselves,”  and  their  successfulness  suffers  accordingly. 

V.  Conclusion 

While  there  are  many  facets  of  the  standard 
learning  classifier  system  that  have  yet  to  be  fully  explained, 
the  LCS  has  proven  to  be  useful  in  several  areas  of  machine 
learning.  To  fully  take  advantage  of  some  of  the  inherent 
characteristics  of  the  LCS,  we  have  introduced  its 
distributed  sibling,  the  DLCS.  With  the  additions  to  the 
learning  classifier  system  described  in  this  paper,  the  DLCS 
can  function  as  a  paradigm  for  inter-agent  communication 
and  cooperation.  We  feel  that  the  DLCS  can  be  an  effective 
tool  in  agent  organization  and  coordination  and  can  be 
useful  over  a  wide  range  of  distributed  artificial  intelligence 
tasks. 
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Abstract 

A  distributed  reinforcement  learning  system  is  designed 
and  implemented  on  a  mobile  robot  for  the  study  of 
complex  task  decomposition  and  dynamic  policy 
merging  in  real  robot  learning  environments.  The 
Distributed  Q-learning  Classifier  System  (DQLCS)  is 
evolved  from  the  standard  LCS  proposed  by  J,H. 
Holland,  We  address  two  of  the  limitations  of  the  LCS 
through  the  use  of  Q-learning  as  the  apportionment  of 
credit  component  and  a  distributed  learning 
architecture  to  facilitate  complex  task  decomposition. 
The  Q-learning  update  equation  is  derived  and  its 
advantages  over  the  complex  bucket  brigade  algorithm 
(BBA)  are  discussed.  Holistic  and  monolithic  shaping 
approaches  are  used  to  distribute  reward  among  the 
learning  modules  of  the  DQLCS  and  allow  dynamic 
policy  merging  in  a  variety  of  real  robot  learning 
experiments, 

1.  Introduction 

Recently,  considerable  interest  has  arisen  in  robot 
learning.  The  general  tliCiue  in  robot  learning  is  that  an 
intelligent  machine  is  one  that  can  sense  its  environment, 
learn  how  to  cause  change  in  its  environment  to  achieve 
a  goal,  form  plans  to  cany  out  tasks,  and  react  to 
unpredicted  external  stimuli.  In  unsupervised  learning 
the  robot  is  given  the  ability  to  explore  its  environment  in 
a  trial-and-enor  fashion  to  collect  data.  From  an 
e\’aluation  of  this  data,  the  robot  must  learn  a  mapping 
from  its  input  sensor  values  to  its  output  effector  actions. 
Reinforcement  learning  involves  the  use  of  feedback  to 
reason  about  the  quality  of  the  robot’s  condition-action 
rules. 

We  have  selected  one  unsupervised  reinforcement 
learning  algorithm,  Holland’s  Learning  Classifier  System 
(LCS)  [3,6,7],  for  study  and  implementation.  The  LCS  is 
a  rule  based,  message  passing  machine-learning 
paradigm  that  incorporates  planning  and  rule  discovery 
for  intelligent  problem  solving  in  a  dynamic 
environment.  While  the  system  we  implement  resembles 


Holland’s  original  LCS  in  structure,  several  additions 
and  substitutions  are  included.  After  discussing  the 
limitations  the  bucket  brigade  algorithm,,  the 
apportionment  of  credit  (AOC)  mechanism  in  Holland’s 
LCS,  we  offer  Christopher  Watkins’s  Q-learning 
algorithm  [10]  as  a  replacement.  We  also  examine  an 
enhancement  of  the  original  LCS,  the  Distributed 
Learning  Classifier  System  {DLCS)  [1,  5].  The  system 
we  implement  has  the  distributed  capabilities  of  the 
DLCS,  but  it  uses  Watkins’s  Q-Leaming  mechanism  for 
credit  apportionment.  We  call  this  system  the 
Distributed  Q-learning  Classifier  System  {DQLCS), 

Monolithic  systems  suffer  from  slow  learning  due  to 
the  large  size  of  the  state  space  created  by  complex  or 
multiple  goal  tasks  [11].  This  explosion  in  state  space 
size  is  called  the  curse  of  dimensionality.  Task 
decomposition  [11]  is  a  solution  to  the  problem  of 
complex  task  learning  in  which  the  overall  task  is 
divided  into  smaller  pieces.  Each  individual  task  is  given 
a  control  module  whose  objective  is  to  learn  only  to 
achieve  that  task.  Then,  instead  of  learning  over  a  single 
state  space  whose  size  is  exponential  in  the  number  of 
tasks,  the  modular  system  learns  over  a  linear  number  of 
constant  sized  state  spaces.  The  capability  of  a  system  to 
then  learn  to  coordinate  these  multiple  behaviors,  or 
policies,  is  called  dynamic  policy  merging  [11],  Many 
questions  still  remain  about  various  shaping  techniques, 
the  schemes  for' combining  reinforcement  learning  and 
distributed  control  on  the  same  system.  From  our 
experiments,  we  comment  on  the  effectiveness  of  the 
DQLCS  at  decomposing  and  solving  robot  learning 
problems  using  two  distributed  reinforcement  techniques, 
holistic  shaping  and  modular  shaping. 

2.  Developing  the  DQLCS 

The  structure  of  the  Learning  Classifier  System 
(LCS)  is  shown  in  Figure  1.  Boxes  represent  the  various 
components  of  the  LCS,  and  arrows  represent  the  flow  of 
data  through  the  system.  The  operation  of  the  system  is 
based  on  the  use  of  lists  of  evolved  production  rules  or 
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classifiers.  Classifiers  are  condition-action  pairs.  Each 
classifier  defines  a  possible  state  of  the  en\ironinent. 
Associated  with  each  classifier  is  a  strength  value.  The 
strength  of  a  classifier  is  related  to  its  current  usefulness 
to  the  system  as  compared  to  all  of  the  other  classifiers  in 
the  system.  The  goal  of  the  LCS  is  to  adjust  the  strength 
of  all  of  the  classifiers  over  time  until  the  classifiers  most 
useful  in  achieving  the  desired  goal  are  distinguishable 
from  the  rest.  A  credit  assignment  scheme  is  used  to 
update  the  strengths  of  the  classifiers.  Credit  assignment 
is  recognized  as  the  key  to  the  success  of  the  LCS. 

The  input  interface  and  output  interface  are  used  for 
interaction  with  the  environment.  The  input  interface 
receives  as  input  sensor  values  and  encodes  them  into  a 
message  for  posting  on  the  message  board.  The  output 
interface  decodes  the  action  part  of  the  winning  rule  and 
sends  the  command(s)  to  the  machine  for  execution.  The 
message  board  is  a  bulletin  board  upon  which  messages 
describing  the  current  state  of  the  s>'stem  are  “posted”. 
The  classifier  list  is  the  rule  base  for  the  LCS  that 
contains  all  of  the  system’s  rules  and  their  associated 
strengths.  Credit  assignment  or  apportionment  of  credit 
(AOC)  is  used  to  adjust  the  strengths  of  the  classifiers 
based  on  the  positive  or  negative  effects  on  the  state  of 
the  system  resulting  from  their  tise. 


Figure  1.  The  Learning  Classifier  System  (LCS) 
structure. 


2.1.  Q-learning  vs.  The  BBA 

The  apportionment  of  credit  component  in  the  LCS 
architecture  as  originally  proposed  by  Holland  is  the 
bucket  brigade  algorithm  (BBA).  In  theory,  the  BBA 
solves  the  credit  assignment  problem  by  encouraging  the 
formation  of  rule  chains.  Research  has  shown  that  the 
BBA  in  its  original  form  could  sustain  pre-existing  rule 
chains,  but  that  it  is  not  strong  enough  to  successfully 
encourage  chaining  from  a  rule  list  consisting  of  initially 


random  strings  [2,  5,  9].  The  backwards  propagation  of 
rewards  from  the  rules  receiving  the  rew'ards  to  their 
supporters  was  observ^ed  to  take  many  time  steps.  This 
slow  learning  process  results  in  the  need  for  a  large 
number  of  initially  long-running  ex'periments  before  the 
first  rules  in  the  chain  receive  any  reward. 

Because  of  the  time  requirements  imposed  by  real 
learning  robot  problems,  we  decided  that  the  BBA  is  not 
sufficient  for  use  in  our  s>'stem  and  that  some  other 
apportionment  of  credit  algorithm  is  necessary.  In  the  Q- 
leaming  classifier  system  (QLCS)  [1]  the  BBA  is 
replaced  by  Q-leaming  [10].  In  a  Q-leaming  system, 
each  state-action  rule  has  an  associated  Q-value.  The  Q- 
value  is  an  estimate  of  the  minimum  cost-to-go 
associated  with  taking  the  rule’s  action  when  the  state  of 
the  environment  matches  the  rule’s  condition.  After  a 
rule  is  executed,  the  environmental  feedback  is  used  to 
update  the  its  Q-value.  Ultimately,  the  system  reaches  an 
optimum  policy,  a  path  of  least  total  cost. 


2.2  Derivation  of  the  Q-learning  Update 
Equation 

Q-leaming  originated  as  a  recursive  algorithm  for 
solving  Markov  decision  problems.  Markov  processes 
contain  a  finite  number  of  states,  with  each  state  having  a 
finite  number  of  possible  actions.  Tlie  probability  of 
making  a  transition  from  one  state  to  another  is  a 
function  of  the  current  state  and  not  on  any  past  history 
of  the  system.  We  call  the  set  of  n  finite  states  in  a 
Markov  decision- process  S  =  {Sf},  /=!,  2, ... ,  n,.  At  each 
state,  there  is  a  set  of  nas  possible  actions  As  =  {ci^}y 
y=l,2,...,;ja».  The  probability  of  making  a  transition  from 
state  Si  to  state  Sj  given  the  action  a  is  pr(Si  sj)  =  p^fa). 
When  an  action  a{t)  €  is  taken  from  state  s{t)  ^  S  at 
time  step  t,  the  new  state  function  S(t  +  1)  =  ^(*^(0,  n(0) 
determines  the  resulting  state.  Each  state-action  rule  has 
an  associated  value,  c,(a),  that  represents  the 
instantaneous  cost  incurred  by  taking  action  a  in  state  Si. 
This  probabilistic  value  is  assumed  to  be  either  always 
positive  or  always  negative  and  is  estimated  by  the 
expected  value:  E  [Cj  (a)]  =  q  (a) . 


We  now  establish  the  value  function  F^(0,  the 
expected  sura  of  all  future  discounted  costs  where  the 
system  starts  at  state  Si  and  follows  the  state  function  p: 


lira  E 


t^O 


(1) 


The  summation  includes  all  future  states  of  the 
s>'stem,  where  St  is  the  state  of  the  system  at  time  t  and  y 
is  the  discount  factor,  y  e  [0,1],  By  appl>ing  the 
discount  factor,  we  emphasize  more  immediate  future 
costs  over  distant  future  costs.  The  inclusion  of  this 


discount  factor  and  the  previously  mentioned  restriction 
on  cost  function  sign  facilitate  the  convergence  of  the 
summation  [10]. 

Bellman’s  principle  of  optimality  [8]  is  used  to  find 
the  optimal  policy,  a  policy  that  minimizes  the  future 
expected  cost.  Bellman’s  equation  can  be  stated  as  the 
following: 

=  min  |c,(a)+x  £  Pij{a)v‘{sA\ .  (2) 

This  equation  says  that  the  minimum  total  cost  from 
the  current  state  to  the  goal  is  the  sum  of  the  minimum  of 
the  expected  instantaneous  costs  for  actions  from  the 
current  state  and  the  minimum  cost  of  going  to  the  goal 
from  the  resulting  next  state. 

By  reformulating  (2)  as  a  recurrence  relation,  the 
costs-to-go  may  be  estimated  over  repeated  trials.  Value 
iteration,  a  recursive  estimation  equation  of  the  form 

=  Z  Pi,  (3) 

has  been  shown  to  converge  to  the  optimal  value  policy 
V’iSi)  for  a  given  initial  estimate  l^(r,)  [10].  That  is,  if  at 
the  fe*  iteration  V'isi)  is  estimated  as  then 

(s.)  ^  V’(s,)  as  A:  00. 

Watkins  [10]  reformulated  Bellman’s  equation  by 
adding  Q-value  notation: 

Q'{si  ,a)  =  Ci  (a)  +/ S  Pij  (a)y'{sj)  •  (4) 

yeX 

In  this  equation,  Q*(Si,a)  is  the  Q-value  associated  with 
taking  action  a  from  state  5,.  Equation  (5)  shows  the 
simplification  of  Bellman’s  equation  (2)  resulting  from 
the  substitution  of  Watkins’s  Q-value  notation: 


F*(., 

)  =  min  ,a). 

(5) 

Appl>ing  the  > 

r'alue  iteration  technique  to 

this 

simplified  form  of 

Bellman’s  equation  gives  us 

the 

following  results: 

=  min  0^(si,a) 

(6) 

and 

(7) 

Now  the  recurrence  relation  (3)  becomes 
min  *- 

asA(,y 


min  nun  (8) 

Sj&S 

Since  this  is  known  to  converge  to  an  optimal  solution, 
we  can  rewrite  (8)  as 


+  P,yW  min  (9) 

The  recurrence  relation  of  (9)  provides  an  estimate 
for  the  Q-value  of  the  state-action  pair  (j,,  a)  in  terms  of 
an  expected  instantaneous  cost  and  a  weighted  sum  of 
minimum  costs-to-go  for  the  state  action  pairs  (sj,  a).  We 
then  approximate  the  unknown  values  c,  (a)  and  p^fa) 
as: 

cfa)^cfa),  (10) 

and 

L  Pij(o)y(Sj)«y(sj).  (11) 

Sj^S 

In  (10)  and  (11),  the  estimate  of  the  expected 
instantaneous  penalty  is  the  single  sample  value  of  the 
incurred  instantaneous  penalty,  and  the  expected 
minimum  cost-to-go  is  estimated  as  the  minimum  of  the 
estimated  costs-to-go  at  the  next  state. 

We  can  write  the  estimate  of  Q(Si,a)  at  the  (A:+y)“ 
iteration  by  substituting  (10)  and  (11)  into  (9): 

g(""')(5,,a)  =  c,(a)+y^*(s^.).  (12) 

Let  Q:(st,at)  be  the  current  estimated  minimum  cost 
for  executing  action  a  in  state  s  at  time  t.  After  taking 
this  action,  we  update  our  estimate  of  Ofs^at)  using  (12). 
The  old  and  new  estimates  are  combined  in  the  relaxed 
Q‘leaming  update  equation: 

2.+1  (^f .  a, )  <-  [l  -  a(.r, ,  )]2,  (-y, .  a, ) 

+a  U.a,)[c^(a,)+r{^,(j,+i)]  (13). 

In  (13),  a(St,  at)  is  a  learning  rate  between  0  and  L 
The  learning  rate  thus  provides  a  means  to  weight  the 
combination  of  the  Q-value’s  past  estimation  and  the  new 
measurement. 

2.3.  The  QLCS  Execution  Cycle 

After  replacing  the  BBA  with  Q-leaming  and  rule 
strengths  with  Q-values,  the  QLCS  execution  cycle  is  as 
follows. 

From  an  initialized  set  of  Q-values,  Qq. 

1.  Observe  the  current  state  s  of  the  system. 

2.  Compile  a  list  of  all  eligible  classifiers  E{s,  t) 

3.  Select  a  winning  classifier  using  a  stochastic 
selection  method. 

4.  Pass  the  action  part  a  of  the  winning  classifier  to  the 
output  interface  to  be  executed. 

5.  Advance  the  system  clock:  /  =  ^  +  1. 

6.  Receive  an  immediate  cost  c(s,a)  for  executing 
action  a  in  state  s  at  time  t, 

7.  Examine  the  new  message  board. 

8.  Compute  the  new  eligibility  set  E{t)  given  the  new 
environment  state. 


9.  Rank  all  of  the  classifiers  in  E(t)  based  on  their  Q- 
values, 

10.  Update  the  Q-value  of  tlie  classifier  chosen  during 
the  previous  clock  tick  using  the  Q-leaming  update 
equation  (13). 

11.  Make  a  probabilistic  selection  of  the  classifier  with 
the  maximum  (or  minimum)  Q-value. 

12.  Goto  4. 

In  many  problems  the  state  space  representation  of 
the  environment  is  too  large  for  a  monolithic  learning 
sj'stem  to  explore  in  a  feasible  time  period.  A  distributed 
architecture  more  suited  to  this  type  of  problem  was 
introduced  by  Dorigo  [4].  His  architecture  allows  for  the 
distribution  of  internal  LCSs  for  the  study  of  learning 
problems  involving  real  robots.  The  Division  of  Labor 
(DOL)  architecture  is  proposed  by  Bay  and  Stanhope  [1]. 
In  the  DOL  learning  system^  separate  modules  focus  on 
solving  independent  parts  of  a  learning  problem.  Tliis 
property  closely  matches  the  idea  of  task  decomposition. 

The  DOL  architecture  contains  a  layer  of  thinker 
LCSs,  each  focusing  on  learning  a  specilic  behavior. 
Above  this  layer  resides  a  combiner  LCS.  The  job  of  the 
combiner  is  to  coordinate  the  decisions  of  the  thinkers 
and  to  choose  the  ultimate  output  action.  Tliis 
architecture  also  includes  a  mediator  for  distributing 
input  sensor  data  and  reward  to  the  thinker  LCSs.  The 
decomposition  of  the  input  messages  is  the  most 
important  property  of  the  DOL  architecture.  It  greatly 
reduces  the  size  of  the  state  space  search  that  must  be 
completed  in  a  learning  application. 

Besides  the  standard  AOC  problem,  distributed 
s}^stems  have  the  problem  of  shaping,  the  decomposition 
and  distribution  of  reward  to  each  LCS  in  the  system. 
Holistic  shaping  treats  the  entire  learning  system  as  a 
black  box.  The  same  reinforcement  is  blindly  given  to  all 
of  its  LCSs.  This  reward  scheme  introduces  some 
ambiguity  problems  to  the  s>'stem,  however.  Thinker 
LCSs  may  be  rewarded  for  bad  decisions  or  may  go 
unrewarded  for  correct  decisions.  In  modular  shaping, 
the  thinker  LCSs  are  first  trained  independently.  After 
they  have  obtained  a  suitable  level  of  performance,  their 
learning  is  “turned  off’,  turning  them  into  reactive 
s>'stems.  The  combiner  learns  to  coordinate  the  action 
messages  of  the  thinker  reactive  systems  to  achieve  the 
complex  goal  in  a  separate  learning  exercise. 

3.  Experimental  Results 

We  obsexv^ed  the  performance  of  various 
configurations  of  the  QLCS  and  DQLCS  when  applied  to 
t>pical  real  robot  learning  problems.  The  small  mobile 
robot  was  equipped  ultrasonic  range  finders  and  infrared 
sensors  that  detected  modulated  light  from  specially- 


constructed  “beacon”  emitters.  The  problem 
environment  selected  for  the  learning  s>'stem  is  shown  in 
Figure  2.  The  problem  is  common  in  robotics:  find  the 
best  path  through  an  environment  to  a  goal  position. 


Figure  2.  The  goal  seeking  problem  environment. 

We  used  three  .infrared  goal  beacon  detectors  and  a 
middle  ultrasonic  sensor  as  input  sensors.  For  these 
sensors,  the  “docked”,  or  goal,  position  was  then  defined 
as  any  position  where  the  robot  was  “seeing”  the  goal 
beacon  with  at  least  the  middle  goal  detector,  and  the 
range  from  the  ultrasonic  sensor  was  one  unit  of 
measure.  Table  1  provides  a  breakdown  of  sensors  and 
their  associated  penalty  values  used  for  instantaneous 
cost  calculations  in  these  experiments. 


Sensor 

Associated  Penalty 

Middle  ultrasonic 

10*(3-bit  range 
value) 

Left  Goal  #2 

0.25*50=  12.5 

Middle  Goal  #2 

50 

Right  Goal  #2 

0.25*50  =  12.5 

Table  !•  QLCS  penalty  values  for  experiments 

An  upper  limit  of  50  time  steps  was  used  for  each 
experiment  Regardless  of  the  position  of  the  robot,  the 
system  terminated  the  trial  after  the  50^  clock  cycle. 

3.1  The  Monolithic  Approach 

The  docking  problem  was  first  studied  using  a 
monolithic  Q-leaming  classifier  system.  The  system 
used  6  bits  of  input:  1  bit  from  each  of  three  goal  beacon 
detectors  and  3  bits  from  the  ranging  sensor.  The  output 
action  string  for  each  classifier  was  four  bits:  2-bits  for 
rotation  and  2-bits  for  translation.  The  state  space  of  the 
problem  was  then  covered  by  or  1024  classifiers.  The 
initial  state  of  the  environment  is  shown  in  Figure  2. 


Figure  3,  Learning  curve  for  monolithic  QLCS. 


Figure  3  shows  the  monolithic  QLCS’s  performance. 
This  system  was  observed  for  100  trials.  For  most  of  the 
experiment,  it  is  difficult  to  find  any  system 
improvement.  Within  the  final  15  trials,  however,  the 
system  appears  to  be  settling  to  some  standard 
performance.  It  also  “timed  out”  less  as  the  experiment 
progressed. 

3.2  The  Distributed  Approach 

Next,  we  studied  the  same  problem  with  the 
distributed  architecture.  Inside  the  DOL  framework,  we 
observed  the  performances  of  holistic  and  modular 
shaping.  Table  2  shows  the  DQLCS  configuration. 


QLCS 

Input  bits/Source 

Output 

Clfs 

Thinker#! 

3/range 

2-bit 

32 

Thinker  #2 

3/beacon  detectors 

2-bit 

32 

Combiner 

2/output  from  Thinker  I 
2/output  from  Thinker  2 

4-bit 

256 

Table  2.  DQLCS  classifier  configuration. 


3.2.1  Holistic  Shaping 

Figure  4  shows  tlie  learning  curve  for  the  holistic 
system.  There  was  only  marginal  improvement  shown  by 
tliis  system.  The  system  “timed  out”  during  most  of  the 
first  20  trials.  When  the  goal  was  reached,  more  than  25 
time  steps  were  always  necessary.  As  the  experiment 
progressed,  the  robot  was  able  to  find  the  goal  more  often 
and  more  quickly,  but  even  then  performance  was  only 
marginal. 


Figure  4.  Learning  curve  for  holistic  shaping  DQLCS. 


3.2.2  Modular  Shaping 

For  use  with  a  modular  shaping  DQLCS,  we  divided 
the  “goal-seeking”  behavior  into  the  sub-tasks  of 
“approach  goal”  and  “locate  goal.  The  “approach  goal” 
behavior  involved  translation  only,  while  the  “locate 
goal”  beha\aor  rotation  only  used  rotation  actions. 

For  the  “approach  goal”  behavior,  the  QLCS  used  3- 
bit  range  data-  as  input.  Four  possible  translation 
commands  were  selected  by  the  2-bit  output  action 
strings.  The  cost  function  returned  a  constant  multiple  of 
the  range  to  the  goal.  The  learning  curve  for  the  system 
is  shown  in  Figure  5. 


Figure  5.  Learning  curve  for  “approach  goal”  behavior. 

The  objective  of  the  “locate  goal”  behavior  was  for 
the  robot  to  learn  to  use  rotation  commands  orient  itself 
towards  the  goal  beacon.  The  QLCS  used  a  3-bit  input 
(one  bit  from  each  of  the  three  goal  detectors)  and 
generated  a  2-bit  action  string  that  was  used  to  select 
from  among  four  rotation  commands.  The  goal  detector 
penalties  from  Table  1  were  used.  To  insure  that  the 
system  learned  the  behavior  completely,  tlnee  different 
initial  environments  were  used.  For  Trials  1-30,  the 
starting  position  had  the  robot’s  left  goal  detector  seeing 
the  beacon.  In  Trials  31-50,  the  right  beacon  detector 
initially  saw  the  beacon.  In  the  remaining  trials,  the 


robot  was  placed  such  that  the  beacon  was  directly 
behind  it. 

The  results  of  the  experiment  are  shown  in  Figure  6. 
Tliis  learning  curve  is  actually  a  combination  of  three 
separate  learning  curves,  one  for  each  new  starting 
position.  In  each  of  the  three  starting  positions,  we  see 
that  the  s>'stem  quickly  learns  the  appropriate  course  of 
action  to  quickly  reach  the  goal  state. 


Figure  6,  Learning  curve  for  rotation  behavior. 


After  the  “locate  goal”  and  “approach  goal” 
behaviors  were  learned,  the  QLCSs’  learning  was  “turned 
off’.  Using  the  two  QLCSs  as  thinkers,  an  untrained 
combiner  was  given  the  “seek  goal”  task  using  modular 
shaping.  Figure  7  shows  the  learning  curve  for  the 
modular  shaping  DOL  system.  The  improvement  of  the 
system  from  the  first  trial  to  the  last  is  easily  visible.  As 
is  shown  in  Figure  7,  the  system  learned  a  path  that 
required  approximately  11  time  steps  after  only  30  trials. 


Figure  7.  Learning  Curve  for  modular  shaping  DQLCS. 

4.  Discussion  of  Results 

From  a  comparison  of  the  performances  of  the  three 
classifier  systems,  we  can  see  that  the  DQLCS  with 
modular  shaping  was  best  suited  for  this  problem.  It 
appears  that  the  size  of  the  knowledge  base  in  the 
monolithic  system  was  its  downfall.  There  simply  were 
not  enough  visits  to  each  state  for  the  incremental 
learning  algorithm  to  be  successfiil.  The  ambiguous 
nature  of  the  holistic  shaping  distributed  system  is  its 


major  problem.  It  appears  that  reinforcement  is  assigned 
to  tlie  thinker  classifiers  in  a  manner  that  is  much  too 
haphazard.  Tlie  learning  process  is  slow;  therefore, 
overcoming  misguided  reinforcement  often  takes  too  long 
to  be  practical  in  real  robot  applications. 

The  performances  of  the  QLCSs  show  that  Q- 
leaming  is  an  acceptable  alternative  to  the  BBA  as  the 
apportionment  of  credit  component  of  the  LCS.  The 
results  from  the  DQLCS  experiments  indicate  that  the 
distributed  architecture  using  modular  shaping  is  more 
time  efficient  at  solving  tasks  that  require  the  learning  of 
complex  behaviors  than  either  a  monolithic  architecture 
or  a  distributed  architecture  using  holistic  shaping.  While 
the  DQLCS  with  modular  shaping  far  outperformed  the 
DQLCS  with  holistic  shaping,  it  also  required  the 
infusion  of  much  more  domain  specific  knowledge.  This 
requirement  limits  its  application  to  a  considerably 
smaller  class  of  problems  than  may  be  attempted  by  the 
DQLCS  with  holistic  shaping. 
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ABSTRACT 

communicauons.  But  in  LCS  'Z  '  architecture;  i.e..  by  posted-mesSge 

shared  resource.  ^  thereby  requinng  no  persistent 

consider  dwm^ei^^uIe^ob^uL'u- hid  mobile  robot  navigation  in  completely  unstructured  environments.  We 

the  world-model  O^t  LcSS  environment,  in  addition  to 

Characteristics  in  addiUon  tololviSlts 

unaltered,  in  robots  with  a  wide  variprvnfv*  ^  ^  ^  ut  results  in  a  learning  controller  that  is  equally  applicable 

based  controller  tiuough  both  s.mulau'^on'^J:::^!^^^^  efTecriveness'of  this  LCS- 

pI^s^rbehTv^rofthrLcTfrom^eril^^^^  Classifier  System  (DECS),  which  generalizes  the  message- 

communication  ^ode  ^^^ents.  tIis 

DECS  is  shown  to  have  potential  application  as  a  learning  00^0!^? f^nitip^intenS^t'^nn^''^*'’'^"^ 

1.  INTRODUCTION 
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inteili<yent  controllers  whirh  extreme  are  planning  controllers,  including  hierarchical 
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whether  an  algorithmic  desizn  Drocedun*  ’c™  actions  in  complicated  environments.  Furthermore,  it  is  unclear 
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Planning  systems  can  better  choose  behaviors  geared  toward  long-term  rewards.  The>'  have  long-term  memory  that  can  be  used 
to  represent  their  environment.  They  can  process  alternatives  with  symbolic  manipuJation  and  arrive  at  correct  decisions  that 
may  be  impossible  with  fixed  rules  and  local  senses.  They  can  anticipate  their  consequences,  evaluate  their  progress,  and  plan 
fumre  actions.  However  their  goal-motivation  may  make  them  simation-insensitive.  Sensor  input  may  be  dwalued  in  favor  of 
their  world  models,  and  they  therefore  may  be  slow  in  adapting  to  a  changing  environment.  Their  world  model  itself  may  be 
taxing  to  acquire,  maintain,  and  reconcile  with  contingencies. 

A  reasonable  compromise  for  a  practical  control  system  is  some  sort  of  adaptive  combination  of  the  two  approaches;  a 
controller  that  follows  stimuius/response  rules  at  low  levels  of  competence,  such  as  in  an  execution  layer,  but  which  still  allows 
the  potential  for  planning.  learning,  and  prediction.  The  reactive  behaviors  should  be  learned  and  evaluated  on-line.  Such  a 
capability  is  potentially  embodied  in  the  learning  classifier  system  (LCS)  [3,  6]. 

*An  LCS  is  a  computing  machine  that  uses  the  architecture  of  a  rule-based  system,  but  also  allows  for  memory  and  learning.  It 
allows  a  user  to  paramemcally  tune  it  to  weight  its  reactive  vs.  planning  characteristics,  thereby  making  it  more  or  less  goal- 
oriented  or  situation-oriented.  Furthermore,  the  LCS  functions  as  a  modular,  message-passing  controller,  so  that  data 
transmission  is  encapsulated  in  low  bandwidth  signals  that  simplify  communications.  In  an  intelligent  agent,  this  characteristic 
allows  us  to  construct  behavioral  modules  that  pass  discrete  messages  with  consistent  formats.  In  this  way,  the  LCS  functions 
much  like  an  object-oriented  program  (OOP),  and  indeed,  OOPs  are  a  favored  simulation  tool  for  their  study.  This  feature  also 
makes  the  LCS  potentially  extensible  into  multiple-agent  domains. 

In  this  paper,  we  will  review  the  structure  and  operation  of  the  learning  classifier  syirtem  and  apply  it  to  the  learning  control  of  a 
simple  mobile  robot  performing  as  an  animat.  This  controller  will  have  to  leara  to  interpret  sensor  data,  drive  the  robot 
purposefully  toward  a  goal,  and  avoid  obstacles  en  route.  We  show  that  the  learning  properties  of  the  classifier  system  enable  us 
to  build  fast,  sparse  stimuius/response  rule  bases  with  default  hierarchies.  We  also  propose  a  distributed  architecture  for 
populations  of  such  robots  in  message-passing  environments.  Simulation  studies  show  that  the  LCS  can  provide  adaptive 
controllers  that  are  in  some  ways  simpler  than  even  deterministic  Boolean  functions. 

L2  Past  Work 

The  need  for  learning  techniques  for  the  generation  of  reactive  behaviors  has  been  widely  recognized  recently.  Among  these, 
Kube  [3,  9],  used  an  adaptive  logic  network  to  learn  the  structure  of  a  combinational  logic  circuit  that  controls  the  movement  of 
multiple  mobile  robots.  Ram  [11]  and  Sims  [14]  both  used  genetic  algorithms  to  leam  reactive  behaviors,  with  [14]  taking  the 
additional  step  of  combining  the  structure  of  the  robot  into  the  evolutionary  process  along  with  the  controller.  An  advantage  of 
using  the  genetic  algorithm  is  that  the  system  remains  continually  adaptive  and  can  adjust  through  incremental  changes  in 
structure  and  function.  It  also  allows  the  designer  to  impose  arbitrary  expert  rules  on  the  system  as  constraints  or  defaults,  and 
to  preserve  them  under  evolutionary  operators  through  elitism  (lOj. 

Lsarrung  classiricrs  themselves  have  been  used  to  control  the  behavior  of  animats,  most  directly  by  Wilson  [16],  and  Dorigo  [4], 
Wilson  proposes  the  environmental  reward  technique  that  we  adapt  for  this  paper,  but  shows  ex^riments  for  the  learning  of  a 
iOgic  function  only  (the  multiplexer  problem).  Dorigo  uses  the  classifier  system  to  track  a  moving  target,  and  shows  some 
behaviors  consistent  with  predictive  abilities  in  the  learned  behavior. 

Our  approach  is  closest  to  that  of  Husbands  [7]  and  Wilson  [16],  as  we  apply  the  LCS  to  an  animat  model  in  a  static 
environment.  Our  main  contribution  here  is  to  show  evidence  that  in  such  a  situation,  reactive  learning  is  effective  without 
planning  abilities,  and  that  a  fest  learning  algorithm  can  economize  on  even  deterministic  logic.  We  also  demonstrate  thaf  the 
transient-message-passing  nature  of  the  LCS  extends  directly  to  multiple  agents,  so  that  the  same  control  architecture  might  be 
used  in  arbitrarily  sized  populations. 

1.3  Review  Of  Learning  Classifier  Systems 

The  learning  classifier  system  was  developed  by  Holland  and  his  associates  after  the  effectiveness  of  the  genedc  algorithm  haH 
been  established  [3,  6].  The  LCS  combines  genetic  operators  and  apportionment  of  credit  schemes  together  with  an  architecture 
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The  overall  arrangement  of  the  LCS  is  shown  in  Figure  I.  It  consists  of  a  set  C  =  {C, } .  /  =  l.....„  of  classifiers  each  wth  A- 

(*  S  1)  conjunctive  condition  words  c,  and  a  single  action  word  a.  At  an  iteration  time  t,  each  classifier  C,  is  therefore 

represented  by  a  (f:  -  1)  -tuple  of  words,  C,(t)  =  (c„(f) . c,i(0.a.(r)) .  Each  of  these  words  is  a  string  of  length  /  with 

element  taken  from  tJhe  set  {0, 1,  # } . 


Separate  from  this  is  the  message  board,  which  at  all  times  consists  of  at  most  m  messages;  M(t)  =  {^[(0 . Each 

message  is  a  single  word  of  the  same  length,  /,  and  consists  of  elements  from  the  set  {0,1} . 


A  data  path  exists  from  the  action  words  of  the  classifier  list  to  the  message  board.  Execuuon  in  its  simplest  form  proceeds  as 
ch  classifier  compares  its  conditions  to  the  message  board.  A  condition  is  said  to  match  a  message  m^.  denoted 

(0  «  Wp  (/) ,  if  each  0  and  1  in  c^  has  an  equal  bit  in  the  same  position  in  .  and  the  ^-symbols  act  as  don't  cares.  An 

eligibility  index  set  £  is  created  that  contains  a  list  of  the  classifiers  Q  for  which  all  conditions  are  matched  by  at  least  one 
message  in  .V/;  i.  e.,  £(/)  =  {/ ;  Cy  (0  ®  (/)  for  ally  =  1, . . . ,  1:  and  any  /?} . 


1  igure  1.  The  architecture  of  the  learning  classifier  system. 

^  Jfrwg/A  value  that  signifies  the  overall  merit  of  the  rule  over  its  lifetime.  The  use  of  the 
vanes  with  ^e  implementation  of  the  apportionment  of  credit  and  reinforcement  components,  but  it  can  always 

oe  interpreted  as  a  relative  figure  of  merit. 

Also  necessary  is  a  supplier  list  1(0  =  (Ii of  length  m,  which  contains  integers  denoting  the  identity  of  the 
classifier  that  posted  the  corresponding  message  at  the  previous  time  step  n  for  each  y  =  l m,  (/)=/,  where 

(0  =  ai(t-l).  These  are  known  as  the  suppliers  of  those  messages.  This  list  wiU  later  allow  a  classifier  to  know  which  of 
ns  counterparts  posted  a  message  that  now  matches  one  or  more  of  its  conditions.  In  the  case  that  message;  was  posted  by  the 


St.  S>TOW1  (such  as  0)  may  be  entered  in  L  that  conveys  this  information.  Tms  u-ill  enable  us  to  create 

e  sets  o(  supporters  that  wll  be  important  for  our  apportionment  of  credit  scheme. 

W^therefS^Jn have  fewer  messages  than  eligible  classifiers;  m  <  e' ,  where  £*  is  the  cardinality  of  the  set  £(/) 
the  mec^sm  that  decides  which  of  the  classifiers  in  £(r)  is  allowed  to  ^st  its  action  on 

e  message  bo^±  Knoivn  as  roulene-wheel  selecuon,  classiners  are  selected  from  £(0  to  post  their  actions  probabiiisticallv. 

wt  max(w,  n  )  classifiers  chosen  according  to  a  bid  value  associated  with  each  classifier.  Classifier  /’s  bid,  B.  (;)  is  a  fimire 


or,  using  the  Boltzmann  distribution. 


IS;«) 

y  tffCf) 


(1) 


PiO  = 


JSi(t) 


y«£(f) 


^  US  to  control  the  likelihood  of  allowing  low-bidding  rules  to  post.  This  selection 
echanism  is  someumes  reterred  to  as  the  conflict  resolution  component  of  the  classifier  system. 

The  m«sage  board  at  the  ne.xt  time  iteration  consists  of  all  sensor  messages  and  the  action  parts  of  all  classifiers  so  chosen.  We 
%vill  call  this  set  of  winning  classifiers  C  . 

Sfnte™rL  ^  themselves,  or  they  may  be  derived  from  a  sensor  interface.  Such  an  interface 

mr^ht  interpret  raw  sensor  data  and  set  appropriate  bits  in  a  message.  These  messages  generally  are  not  subject  to  the  roulette 
w  heel  compeuuon  with  the  acuons  of  the  classifiers,  because  sensor  data  mav  be  necessarily  acciLible  at  any  Ume  In  practice 

the  number  ofsensorswll  dictate  the  minimum  allowable  length /for  the  messages'.  ss  me  at  any  ume.  In  pracuce. 

(tre-^SSbk  interface  recognizes  postings  that  have  a  true  output  tag 

(pre  specifi^  bit  posiuon(s)),  and  routes  them  instead  to  a  set  of  actuators.  Optionally,  the  outputs  may  <^0  to  both  the  effector 
interface  and  the  message  boari  in  order  that  classifiers  may  see  the  current  state  of  the  actuator. 

1.3  Apportionment  of  Credit  and  Reinforcement 

rne  apporaoment  of  credit  block  in  the  diagram  can  take  a  number  of  forms.  As  originally  proposed  bv  Holland,  this  was  the 

this  pnvilege  by  shanng  its  strength  with  those  classifiers  that  previously  posted  messages  that  nw  enable  if  its 

t  mesigrboard.  (Thf  distinJuoa  bet^^n 

Scitiffs  ^  ^  ^  classifier's  strength,  its  spedjicity,  and  its  support.  The 

dlSrf  J  of  no/r^ont  cares  in  the  rule,  thereby  weakening  default  rules.  The  support  is  the  sum  of  thTstrengths 

DeSls  Sbba"  iSori,?'  that  are  triggered  by  other  mJes  that  are  themselves  relatively  stro^ 

Details  of  the  BBA  algonthm  may  be  found  m  (2, 3.  6,  and  131.  In  principle,  the  BBA  emulates  a  consumer/supplier  Lnomy 

riS  concatenate  each  sensor  input  string  with  an  internally  posted 

s^uiSSLlv^T®  sensor  /  part  internal  message)  and  ensures  that  an  ext^  action  may  be  tak^ 

sunuitaneously  with  an  mtemal  message  posting.  ^ 
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that  encourages  backward-chained  sequences  of  ruies,  thus  accompiishinff  croal-orientf^npcc  t’  ■  r 

pnmarily  responsible  for  encouraging  situadon^rientedness.  °  ^  reinforcement  component  is 

The  reinforcement  component  is  closely  ded  to  the  environment,  and  mav  also  use  an  inr-Ui^-nf  n„nr  -  ■ 

n-=nsth=„ 

1.4  Discovery  Component 

operators  [lOJ.  We  also  use  ehtism,  which  prevents  good  niles  from  being  destroyed  by  the  random  evoludonary 

2.  THELCSAiND  THE  ANIMAT  PROBLEM 

“  autonomous  mobile  agent  that  lives  in  an  unstructured  environment  Its 
P  occuveiy  serves  as  a  model  for  arbitrary  problems  m  stare  spaces,  and  in  panicular.  Markov  decision  processes. 

™  ^  m„bile  robot  model  Fifunr  2  shows  the  model  and  the 

desotbed  by  the  iet  of^t  JoST"  '  '"“^"""“ts  are  relative  to  itself.  The  modon  of  the  robot  is 

Kodor  =  J  (^nghf  ^U/t ) 

Mrohenmrre"*^'  E‘l““°“0)i»hyb«to=ly  approximated  to  give  estimaledpositionp 

P robot  (0  S  Pro(„l  “  I)  +  J  (^'ng/ir  (^  -  1)  +  (^  -  1)) 

9(0  s  e(r  - 1)  (r  - 1)  -  kp^(r  - 1)) 

Uie  left  motor  will  cause  u  to  turn  nght  or  that  turning  on  both  motors  will  cause  it  to  go  forward.  This  knowledge  wiU  have  tS 
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ue  learned.  Furthermore,  the  robot  will  have  to  learn  to  navigate  in  its  world  of  goal  and  obstacles.  Feedback  to  enable  both  of 
these  learning  behaviors  will  come  from  the  sensors.  However,  while  the  classifier  s>stem's  input  interface  will  tell  the 
controller  only  binary  information  (object/no  object)  from  the  detection  region,  the  environmental  reward  component  will  have 
analog  data  on  the  distance  to  the  goal  and  from  the  nearest  obstacle  (provided  they  are  within  the  detection  region).  Thus  the 
reward  can  be  based  on  proximity  values.  The  rules  fire  in  response  to  object  detectors,  while  the  reward  utilizes  object  ran<ge 
Note  that  our  sparse,  low-resolution  sensor  data  will  result  in  perceptual  aliasing  [15],  so  that  the  robot  wll  have 
dicaculty  disilnguishing  different  world-states.  This  is  intentional  and,  we  believe,  realistic. 


Sensing  Regions; 


larc  goal  sensor 


[oft  obstacle  tensor 


right  goal  tanaor 


right  obstacle 


I  ig^e  2.  Sensor  and  kinematic  models  for  the  animat-robot.  Modeled  after  ultrasonic  and  infrared  sensors,  respectively,  the 
goal  sensors  have  a  much  larger  range  than  the  obstacle  detectors,  and  both  sensor  ranges  overlap  dead-ahead  (so  that  the  robot 
IS  not  doomed  to  a  life  of  tacking  back  and  forth  as  it  travels.) 

2.2  -Application  of  the  Deterministic  Classifier  System 

^  a  prehminaiy  baseline  e.xperiment,  we  first  show  how  this  problem  can  be  solved  with  a  fixed  rule  base  that  implements 
Boolean  logic.  This  will  give  a  deterministic  classifier  system  that  uses  no  bucket  brigade.  This  hand-coded  classifier  system 
can  be  represented  by  the  equations: 

left_  wheel  =  Jg  +  +  JjJ, 

right_^  wheel  “  *^0*^3  ^ 


where  sg  .  •Ti  .  .^2 .  and  sj  denote,  respeedvely,  the  right  and  left  goal  sensors,  and  the  right  and  left  obstacle  sensors. 

.0  implement  these  functions  in  the  deterministic  classifier,  we  will  use  /  =  6,  k  =  1,  /n  =  1 ,  with  the  only  message  on  the 

message  board  being  reserved  for  the  sensors,  and  all  postings  going  directly  to  the  output.  The  format  for  the  condition/action 
:Lnngs  are  shown  in  Figure  3: 


condrtion/actioo  strings 

X 

1  X 

S3 

S2 

S1/RW 

S0A.W 

tag"  bits: 


00  msg.  from  sensors 

01  msg.  to  aduators 

IX  reserved  tags 


igure  3 .  Word  formats  for  the  condition  suing  and  action  string.  Unused  bit  positions  may  be  reserved  for  use  in  funire  tags, 
jch  as  might  indicate  whether  a  message  was  posted  by  a  classiiier  or  a  sensor. 


SP!B  Vol.  259 1  I  93 


succassnJIy  implements  the 


With  this  format,  one  can  show  that  the  sv'siem  of  the  followin'^ 
Boolean  functions  (5):  ° 


seven  classifiers  in  Table 


1 


conditions 

actions 

(rightmost  two  bits  posted 
to  actuators  fon/off)) 

Q 

[  0 

0 

# 

j_p 

0 

1 

# 
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10^ 

# 

0 

1 

■M 

ff 
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rr 
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u 

rr 

0 

0  1 
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# 
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n~ 

0l 
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1 

0 

1 

# 

1 

# 

Q 

0 

1 

0  1 

rr 

# 

0 

1 

rr 

# 

1 

# 

0 

0 

# 

-OJ 

0 

Tl 

0 

1 

# 

#  1 

Tl 

szSBiESi^ 


fSni  (w(.h«  bucket  bnpde).  The  »nt  (a)  ia  Ute  -huaeliue-  um  execuKd  wid.  Ute 

lower  leS  posiuon  to  the  star  at  tifupper  ri^  igure.  the  honioutal  bats  ate  obstacles,  aud  the  robot  mun  sian  liptu  the 
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3.  USING  THE  LEARNING  CLASSEFEER 


Given  the  performance  of  the  deterministic  classifier  as  seen  in  Figure  4a,  one  would  expect  that  a  learning  classifier  might 
amve  at  a  similar  set  of  rules  after  experimentation.  Although  the  deterministic  control  rules  generated  (5)  were  not  a  unique 
choice,  we  con  follow  the  lead  of  past  literature  and  consider  them  a  "correa"  set  by  w'hich  to  measure  the  learned  classifiers 
experiment  is  to  learn  to  choose  a  suitable  set  of  learning  parameters,  including  the  genetic  operator 
probabilities  and  the  environmental  reward/penaJty  functions,  and  then  analyze  the  rules  that  result.  We  might  guess, 
incorrsctiy  ,  that  when  the  classifier  system  has  learned  to  guide  the  animat  to  the  goal,  that  we  might  "read”  the  classifiers  in 
the  list,  translated  by  the  word  format  of  Figure  3,  and  derive  Boolean  logic  similar  to  equations  (5).  A  similar  end  result  was 
produced  with  an  "adaptive  logic  network"  in  [8,  9]. 

Because  we  are  temporanly  investigating  the  learning  of  a  combinational  logic  function,  we  eliminate  any  sequential  processing 
by  turning  off  the  bucket  brigade.  Wc  will  again  use  /  =  6,  ^  =  1,  m  =  1 ,  and  /i  =  32  .  We  will  initialize  these  32  classifiers 
»vith  random  arrangements  from  {0,1,#}  (e.xcept  for  the  fixed  "tag"  columns  which  remain  invariant  even  under  genetic 
operations).  We  use  a  (dimensionless)  step  size  of  t  =  15  and  a  robot  radius  of  r  =  1 . 

\Vhen  any  classifier  jposts  an  action  to  the  output  interface  (which  happens  in  this  case  at  every  iteration,  since  there  are  no 
posts  to  the  message  board),  a  payoff  value  v  is  computed  according  to: 

where  respectively,  chosen  rewrard  amounts  for  moving  closer  to  the  goal,  pointing  more  directly 

at  the  goal,  and  crashing  into  an  obstacle.  Flags  /goah  fang/e*  ^^/cras/t  set  to  1  if  these  three  conditions  are  detected;  0 

otherwise.  The  first  two  terms  of  the  function  above  reflects  a  reward  of  IF  if  a  good  condition  is  achie\'ed,  and  a  penalty  of 
0.5  if  the  result  is  bad.  Crashing  is  always  penalized. 

Whenever  an  action  is  taken,  a  set  ^  is  created  using  the  posting  ciassifier(s)  C  as  follows: 

A  -  En  {i:a.  «  for  any  e  Q 

Then  wc  use  the  reward  distribution  function  to  adjust  the  strength  Sf  of  each  classifier 


^;(0  = 


A  ifisA 
if/^A 


where  is  a  penalty  faaor  that  is  (optionally)  used  to  penalize  classifiers  that  are  in  £,  (their  conditions  match  the  sensors), 

but  whose  actions  are  not  the  same  as  the  winners*  actions.  In  the  experiments  given  here,  wc  use  =  0 ,  with  the 

reasoning  being  that  we  prefer  to  encourage  the  emergence  of  diverse  classifiers  relevant  to  any  input,  rather  than  force 
classifiers  to  agree  on  an  output  in  response  to  a  particular  input 

ror  the  discovery  component  we  apply  mutation  and  crossover  every  50  iterations,  so  that  the  reward  component  has  sufBcicnt 
time  to  separate  good  rules  from  bad  ones  between  genetic  operations.  To  perform  discovery,  we  select  one  parent  based  on 

^  t  ne  classifier  is  "sub-symbolic,*  meaning  that  it  learns  rules  that  are  unconstrained  by  any  syntax  that  the  user  is  prepared  to  translate  into 
engiish.  One  may  look  at  an  operational  LCS  and  not  be  able  to  decipher  its  rules. 
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rouJette-whesl  selection  considering  tlie  strengths  of  the  classifien.  We  then,  -.rith  probabUit>'  v  ,  select  a  second  parent  C. 
again  using  the  roulette  wheel.  If  a  second  parent  is  chosen,  a  crossover  is  penormed  between  Q  and  C,  at  a  random  bit 
position.  If  no  second  parent  is  selected,  the  first  parent  is  merely  replicated.  Tne  offspring  is  then  provnded  an  initial 
strength  by  sharing  the  strengths  equally  with  its  one  or  nvo  parents.  It  is  used  to  replace  a  weak  ciassmer  by  roulette-wheel 
se  ecaon  based  on  a  fitness  vector  wx\(Sj  (/))  -  S,  (t) .  This  prevents  the  strongest  classifiers  from  being  killed  to  make  room 

for  a  mere  mutation. 

Mutation  is  then  performed,  bit-wise,  on  the  entire  set  C,  with  probability  n  .  This  mutation  is  three-way,  such  that  a  0  might 
remain  0  (with  probability  I  -  p ),  or  with  probability  p ,  become  a  1  or  #  (50%  chance  of  each). 

4.  SIMULATION  RESULTS 

Shown  in  Figures  4(bMl)  are  three  trial  tuns  of  the  learning  classifier  system  controlling  the  animat  to  reach  the  goal.  These 

are  typical  tnaJ  runs  and  show  behavior  similar  to  the  hand-coded  version  (Figure  4(a)).  In  (b-d),  all  rules  are  initialized  at 
random. 


5.  EXPERIMENTAL  RESULTS 

To  test  real-time  performance  and  the  effects  of  sensor  and  actuator  errors,  a  learning  classifier  system  was  prognunmed  to 
control  Curly,  a  modified  RWI B12  robot  (12].  Curly  is  outfitted  with  a  Motorola  68HCII  based  microcontroller  with  32K  of 
e-xtemal  RAM  for  programming.  It  has  a  different  set  of  sensors  than  the  simulated  model:  First,  it  has  three  diffiise-reflective 
obstacle  deteaors  with  ranges  of  approx.  10cm,  mounted  at  dead-center  and  approx.  30®  to  either  side.  It  also  has  a  long-range 
imrared  <^^tor  that  detects  only  modulated  infrared  from  a  special  source.  This  source  "beacon"  represents  the  goal  and  emits 
at  0  KHz  modulated  by  160  Hz.  Last,  it  has  an  ultrasonic  range  finder  mounted  so  that  it  points  in  the  same  direction  as 
the  beacon  detector.  This  sensor  is  used  by  the  reinforcement  component  to  determine  progress  toward  the  goal,  but  its  range 
readings  are  not  available  to  the  LCS.  thereby  making  the  sensing  resolution  of  Curly  roughly  equivalent  to  the  simulated 
model.  Figure  5  shows  Curly's  sensor  apparatus. 


rigure  5.  Curly,  the  modified  B12,  with  sensor  ranges. 

The  LCS  is  surprisingly  simple  to  program  on  a  real  robot,  and  can  be  done  with  few  lines  of  assembly  or  C  code.  xMost  of  die 
extensive  programming  necessary  for  the  simulation  runs  was  in  simulating  the  robot  itself  and  its  environment,  as  well  as  a 
user  interface  [5].  The  actual  mechanism  of  the  classifier  system  (Figure  1)  is  quite  efficient,  with  the  genetic  algorithm  being 
probably  the  most  timeHronsuming.  We  note  that  Curly  is  not  a  treaded  vehicle,  as  is  the  simulated  model,  but  is  instead 
s>Tichro-drive.  The  natural  outpuu  are  therefore  not  righc_wheel  and  left_wheel.  but  translate  and  rotate.  The 
controller,  although  it  now  writes  to  these  two  actuator  behaviors,  is  unaltered.  Sensorimotor  behaviors  are  learned  entirely 
from  uninterprectcd  data»  so  the  new  kinexnatics  are  learned  automatically. 


Vith  an  LCS  programmed  on  Guriy,  Figure  6  shows  a  representative  search  trial.  Tabulated  data  over  manv  trial  runs  is  not 
presented  here  because  the  variance  of  the  results  (number  of  steps  to  reach  the  goal)  is  veiy  hJgh,  and  few  trends,  as  yet,  are 
apparent.  E.-cpenments  are  now  underway  to  repeat  the  trials  with  an  operational  BBA  and  with  Q-learmng  in  the  hopes  that  a 
learning  curv’e  will  be  more  readily  apparent. 


figure  6.  Trial  run  for  a  mobile  robot  in  an  unknown,  unstructured  environment.  The  environment  is  a  clunered  laboratory 
with  rectangular  and  ellipsoidal  obstacles. 


6.  THE  DISTRIBUTED  LEARNING  CLASSIFIER  SYSTEM 

•A.  popular  extension  to  reactive  mobile  robot  controllers  is  the  concept  of  a  distributed  network  of  mobile  robots.  With  this  in 
mind,  we  propose  the  distributed  learning  classifier  system  (DLCS).  The  DLCS  is  designed  with  the  primary  constraints  that  /) 
communications  should  be  kept  to  an  absolute  minimum  in  order  to  avoid  exacerbating  a  combinatorial  explosion  in  overhead 
and  rouung  requirements,  and  ii)  communicadons  should  be  limited  to  sporadic,  self-contained  messages  of  fixed  format  that 
.-nay  be  addressed  to  any  individual  or  simply  broadcast  to  the  popuJadon  as  a  whole.  It  is  widely  noted  that  simple  broadcast 
communicadons  can  be  very  useful  in  muld-robot  coordinadon  [IJ. 

The  architecture  we  propose  is  depicted  in  Figure  7  below.  In  the  architecture,  two  extra  connecdons  are  added  to  the  separate 
learning  dasher  systems.  First,  a  transmit  queue  is  offered  as  a  third  opdon  for  output  messages.  That  is,  acdons  may  be 
posted  to  the  internal  message  board,  to  the  output  interface,  or  to  the  transmit  queue,  which  places  it  on  the  communicadons 
tnedium.  Second,  a  receive  queue  is  added  to  the  message  board  so  that  it  can  take  data  from  the  input  interface  or  classifier 
list,  as  before,  or  from  the  communicadons  bus. 

If  the  assumed  communicadons  medium  is  pierfect,  this  architecture  may  be  viewed  as  a  concatenadon  of  each  message  board 
and  each  classifier  list,  since  all  posdngs  to  or  from  one  are  available  to  another.  The  DLCS  would  thus  act  as  a  single, 
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distributed  classifier  s>'stem.  In  realit)-,  though,  the  communications  channel  is  a  protected  resource,  and  it  may  not  even  be 
desirable  for  all  classifiers  to  be  so  tightly  coupled.  An  alternative  would  therefore  be  a  rationing  of  the  communications 
channel  such  that  each  classifier  peribrms  its  duties  as  an  individual,  but  selected  (/.e.,  particularly  strong)  classifiers  could 
share  acuons  with  others  through  net  broadcast.  A  second  option,  which  implements  the  "learning  by  watc^ng"  paradigm  for 
distnbuted  learning  [15]  would  be  to  allow  entire  rules  (condition  and  action)  to  be  shared,  based  on  their  relative  strengths. 


Communications  Bus 


^  Agent  2  Agent  3 


figure  7.  .Architecture  of  the  DLCS.  Agents  may  selectively  post  to  and  read  from  the  communications  channel,  implemented 
as  a  distributed  message  board 

Such  a  DLCS  is  currently  under  study  and  is  being  adapted  to  a  number  of  distributed  optimization  problems  as  well. 
Simulations  with  multiple  mobile  robots  are  presented  in  [5]. 

7.  CONCLUSIONS 

•Although  it  is  clear  that  the  LCS  forms  an  appropriate  architecture  for  the  e-xecution  of  reactive  ruies  in  mobile  robots,  real- 
world  e.xperiments  are  inconclusive  as  to  its  learning  powers.  Although  the  simulated  behavior  was  quite  successful, 
e.v'periments  with  real  robots  shows  a  very  weak  learning  curve.  We  note,  however,  that  persistent  learning  is  due,  in  part,  to  a 
successful  apporuonment  of  credit  component,  which  we  have  not  used  in  this  study.  Pan  of  the  reason  for  this  was  our 
observation,  under  simulated  scenarios,  that  the  BBA  successfully  maintains  a  sequence  of  classifiers  whose  actions  are 
appropriately  rewarded,  but  it  is  not  very  good  at  working  in  conjunction  with  the  discovery  component  to  generate  viable 
sequ^nc.s.  There  are  some  indications  that  evolution  of  teams  of  rules  and  specialized  cvolutionarv  operators  may  be  more 
usehil  in  this  regard  [2,  13 1. 


Instead,  an  analysis  of  the  strengths  vs.  time  for  individual  mles  (not  shown  here)  indicates  that  rules  live  a  relatively  short  life 
under  the  evoludoi^  operators.  Our  robot  lives  in  a  sensory-sparse  environment  and  uses  a  fast  rate  of  learning  (high 
mutation  probabilities  of  0.1  -  0.3)  to  generate  strong  rules  that  "live  for  the  moment."  At  any  given  time,  only  two  or  three 
rules,  on  average,  have  significant  strengths,  after  which- their  usefulness  wanes,  they  die  and  are  quickly  replaced.  We 
therefore  observe  that  we  may  use  considerably  fewer  than  the  number  of  deterministic  hand-coded  Boolean  rules  (seven) 

needed  in  our  example,  as  long  as  we  trade  them  when  they  become  irrelevant  The  robots  with  this  controller  are  extremely 
situation-oriemed. 


Future  work  is  concentrating  on  the  DLCS,  as  this  is  an  entirely  unexplored  area  for  multiple  agent  coordination.  The 
inherently  message^iriented  nature  of  the  system  translates  directly  to  existing  communications  protocols,  and  promises  speed¬ 
up  through  parallelism  for  computauonal  optimization  tasks  in  general. 
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